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EXECUTIVE  SUMMARY 


PURPOSE 

During  the  past  several  decades  problems  of  environmental  contamina¬ 
tion  have  become  increasingly  important,  both  from  the  scientific  and 
the  legal  standpoints.  In  recent  years  a  great  deal  of  attention  has 
been  directed  to  the  potential  toxicity  to  acquatic  organisms  of  chemicals 
discharged  into  water  bodies. 

The  U.S.  Army,  through  activities  such  as  munitions  manufacture, 
operates  a  number  of  plants  that  produce,  consume,  or  discharge  a  variety 
of  chemical  substances.  Some  of  these  discharges  enter  bodies  of  water 
inhabited  by  various  aquatic  species.  Thus  the  Army  must  provide  the 
USEPA  with  safety  data  concerning  the  levels  of  such  discharges  and  the 
possible  extent  of  resulting  suface  water  and  ecological  contamination. 

In  order  to  develop  such  data  the  Army  conducts  both  intramural  and 
extramural  programs  of  aquatic  toxicity  testing. 

Considerable  amounts  of  time,  money,  and  manpower  are  expended  by  the 
Army  in  such  aquatic  toxicity  testing  programs.  To  make  these  programs 
more  efficient  and  more  effective,  the  need  has  been  felt  for  a  reexamina¬ 
tion  of  some  of  the  standard  methods  used.  This  has  been  especially  true 
of  statistical  methods  involved  in  the  design  of  testing  programs  and  the 
analysis  of  resulting  data.  Feder  and  Collins  [1]  have  considered  a  number 
of  the  statistical  aspects  of  the  design  and  analysis  of  chronic  aquatic 
toxicity  tests  with  fathead  minnows.  The  present  study  considers  statisti¬ 
cal  aspects  of  the  design  and  analysis  of  chronic  tests  with  Daphnia  magna . 

Many  of  the  statistical  methods  used  in  this  study  were  adapted  from 
those  used  by  Feder  and  Collins  [1]  in  their  analysis  of  data  from  fathead 
minnow  chronic  toxicity  tests.  Other  statistical  methods  discussed  and/or 
developed  in  this  study  do  not  appear  in  [1];  some  have  not  been  previously 
applied  to  aquatic  toxicity  data.  The  methods  discussed  in  [1]  and  in 
this  report  provide  increased  information,  as  compared  with  standard 
methodology,  about  the  structure,  relations,  and  anomolies  in  the  data. 

Thus  the  statistical  design  and  analysis  considerations  for  Daphnia  magna 
tests  discussed  in  this  report,  in  conjunction  with  those  discussed  by 
Feder  and  Collins  for  fathead  minnow  tests  should  improve  the  design, 
reporting,  and  statistical  analysis  of  data  from  chronic  aquatic  toxicity 
tests.  This  will  enhance  the  sensitivity  of  conclusions  that  can  be 
derived  from  these  tests,  thereby  increasing  their  efficiency. 


APPROACH 


The  approach  used  in  this  report  is  much  like  that  used  by  Feder  and 
Collins  [1],  A  variety  of  topics  pertaining  to  data  display,  statistical 
analyses,  and  experimental  design  are  discussed  in  detail.  For  many  of 
the  topics,  alternative  statistical  approaches  and  procedures  are  presented 
All  the  statistical  procedures  are  illustrated  with  examples  based  on  real 
data  from  chronic  tests  with  Daphnia  magna .  The  data  were  kindly  provided 
by  several  investigators  at  different  laboratories  and  represent  a  number 
of  variations  in  the  design  of  chronic  Daphnia  tests. 

The  statistical  procedures  discussed  in  the  body  of  the  report  repre¬ 
sent  a  combination  of  methods  that  have  been  previously  applied  to  aquatic 
toxicity  data,  methods  that  are  in  the  statistical  literature  but  which 
have  not  been  commonly  applied  to  aquatic  toxicity  data,  and  methods  that 
have  been  especially  developed  or  extended  for  this  study. 

RESULTS 

A  number  of  procedures  for  the  statistical  analysis  of  Daphnia  magna 
chronic  toxicity  are  discussed.  The  suggested  procedures  are  illustrated 
with  examples  based  on  various  chronic  toxicity  tests.  The  data  used  for 
illustration  reflect  a  number  of  the  possible  variants  in  test  design. 
Namely  some  of  the  tests  are  flow-through  while  others  are  static.  Some 
contain  multiple  daphnids  per  beaker  while  others  contain  individual 
daphnids  per  beaker.  Some  contain  solvent  control  groups  while  others  do 
not.  Numbers  of  daphnids  per  group  vary  from  ten  to  eighty  while  numbers 
of  beakers  per  group  vary  from  three  to  ten. 

Data  analysis  topics  discussed  include  preliminary  and  residual  graphi¬ 
cal  displays;  preliminary  tests  of  beaker  to  beaker  heterogeneity  within 
groups  for  mortality  and  length  responses;  adjustments  to  account  for 
such  heterogeneity;  outlier  detection  tests;  comparisons  of  average 
mortality,  length,  and  reproduction  levels  between  water  control  and 
solvent  control  groups  along  with  conceptual  implications  of  discrepancies 
that  might  be  found;  overall  tests  of  heterogeneity  in  response  levels 
across  treatment  groups;  treatment  group-control  group  pairwise  multiple 
comparison  and  confidence  interval  procedures;  the  fitting  of  dose 
response  curve  models  to  mortality,  reproduction,  and  length  responses; 
point  and  confidence  interval  estimation,  based  on  these  fits,  of  concen¬ 
trations  associated  with  biologically  significant  increases  or  decreases 
in  response  levels  relative  to  the  controls;  statistical  precision  to  be 
expected  from  tests  as  a  function  of  sample  sizes,  variability  of  responses 
and  extent  of  beaker  to  beaker  heterogeneity  within  groups;  a  rationale 
for  unequal  allocation  of  test  beakers  within  treatment  groups,  with 
greater  numbers  of  beakers  in  the  control  group  and  lower  treatment  groups 
and  lesser  numbers  of  beakers  in  the  higher  treatment  groups;  analysis  of 
time  trends  in  reproduction;  analysis  of  time  to  death. 


CONCLUSIONS  AND  RECOMMENDATIONS 


Several  of  the  conclusions  and  recommendations  from  this  study  are  similar 

to  those  arrived  at  by  Feder  and  Collins  [1],  We  state  some  of  them  again 

here  for  completeness. 

1.  Standardized  conventions  and  formats  for  reporting  test  design, 
laboratory  conditions  under  which  the  test  was  carried  out,  and  test 
results  would  facilitate  communication  among  investigators,  labora¬ 
tories,  and  government  regulatory  agencies  and  would  lead  to  greater 
reproducibility  of  test  results  across  laboratories  and  across  time. 

2.  Some  of  the  "standard"  methods  currently  used  for  analyzing  data 
from  aquatic  toxicity  tests  can  and  should  be  modified.  The  data 
should  first  be  graphed,  outlying  observations  or  groups  of 
observations  should  be  located  and  the  reason  for  their  aberrant 
behavior  determined,  and  tests  for  heterogeneity  among  beakers 
within  groups  should  be  carried  out.  Subsequent  comparisons  of 
response  levels  across  groups  should  take  into  account  such 
heterogeneity  or  aberrant  values. 

3.  Whenever  solvent  controls  are  included  in  the  test  their  responses 
should  be  compared  with  those  of  the  water  controls.  If  no  dif¬ 
ferences  are  evident,  the  two  control  groups  may  oe  combined  for 
comparisons  with  the  treatment  groups.  If  there  is  evidence  of 
differences  among  the  groups  then  the  solvent  control  group  would 
usually  be  used  for  comparisons  with  the  treatment  groups.  However 
the  test  results  are  then  at  best  tentative  since  there  is  no  way 
to  determine  whether  the  observed  toxic  effects  were  due  to  the 
toxicant,  to  the  solvent,  or  to  some  interaction  between  them. 

4.  If  hypothesis  tests  are  to  be  used  to  compare  the  treatment  group 
and  control  group  responses  they  should  be  one  sided  tests  which 
are  sensitive  to  alternatives  in  a  particular  direction,  rather  than 
overall  analysis  of  variance  type  "shotgun"  tests. 

5.  Multiple  comparison  procedures  and  confidence  intervals  procedures 
should  be  used  to  determine  specifically  which  treatment  groups  have 
responses  which  differ  from  the  control  group  responses  and  whether 
the  differences  are  of  biological  significance.  Significance  tests, 
by  themselves,  are  not  adequate  to  define  an  MATC*.  Confidence 
bounds  should  be  routinely  constructed  at  the  MATC  to  determine 
just  how  much  worse  than  the  control  group  the  response  at  that 
concentration  could  conceivably  be.  In  general,  confidence  intervals 
impart  much  more  information  than  hypothesis  tests  and  should  be 
routinely  used. 

6.  A  way  to  impose  monotonicity  or  smoothness  structure  on  the 
responses,  to  smooth  the  data,  and  to  convert  a  hypothesis  testing 
problem  into  an  estimation  problem  is  to  fit  dose  response  curve 
models  to  the  data  and  to  define  the  "saf e"concentration  as  that 
which  results  in  no  more  than  a  specified  increment  in  response 


♦Maximum  acceptable  toxicity  conccentration. 


from  the  control  group.  Dose  response  curves  for  mortality 
responses  may  be  based  on  standard  probit  or  logit  models  or 
nonstandard  generalizations  of  these;  dose  response  curves  for 
length,  reproduction ,  weight,  or  other  quantitative  responses 
may  be  based  on  multiple  regression  models  such  as  polynomials 
or  mechanistically  motivated  nonlinear  forms. 

7.  Statistical  power  and  precision  depend  on  the  number  of  daphnids 

per  group,  the  number  of  beakers  per  group,  and  the  extent  of  beaker  to 
beaker  and  daphnid  to  daphnid  response  variability.  In  the  presence 
of  substantial  beaker  to  beaker  heterogeneity,  the  effective  sample 
size  per  group  may  be  closer  to  the  number  of  beakers  than  to  the 
number  of  daphnids.  It  is  thus  good  design  practice  to  divide  the 
daphnids  within  each  treatment  or  control  group  among  as  many  beakers 
as  can  be  accomodated  within  cost  and  logistical  constraints. 

8.  Under  certain  circumstances  it  is  sensible  to  allocate  experimental 
resources  so  that  the  control  group  and  lower  concentration  groups 
receive  more  beakers  and  daphnids  than  the  higher  concentration 
groups.  This  may  result  in  greater  inference  sensitivity  in  the 
region  of  the  MATC.  Proportional  diluters  now  have  the  capability 
to  permit  such  asymmetrical  allocations. 

9.  Statistical  power  or  statistical  precision  goals  should  be  stated 
as  part  of  the  protocol  for  each  individual  toxicity  test  and 
sample  sizes  should  be  determined  accordingly. 

10.  Useful  information  can  be  obtained  by  studying  concentration  related 
trends  in  growth  of  reproduction  with  time  and  in  time  to  death. 

Since  these  two  responses  are  routinely  determined  under  standard 
protocols  at  least  three  times  per  week,  they  should  be  reported 
at  the  intervals  at  which  they're  observed,  to  permit  statistical 
analysis  of  the  above  responses.  Costs  premitting,  consideration 
should  be  given  to  reporting  reproduction  and  mortality  on  a  more 
frequent  basis  than  three  times  per  week;  perhaps  daily. 
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INTRODUCTION 


During  the  1960's  and  1970's,  environmental  contamination  in  general 
and  water  pollution  specifically,  became  increasingly  important  as  legal 
and  scientific  problems.  Regulatory  agencies  needed  scientific  data  to 
support  the  notion  that  a  problem  existed  and  also  needed  factual  informa¬ 
tion  for  establishing  tolerance  limits  for  levels  of  chemical  discharges 
into  surface  waters.  From  that  need  evolved  numerous  standard  toxicity 
tests. 

Aquatic  toxicologists  and  biologists  developed,  refined,  and  standard¬ 
ized  many  of  the  biological,  chemical ,  and  operational  factors  pertaining 
to  such  tests.  However  the  statistical  aspects  of  test  design  and  analysis 
of  the  resulting  data  have  lagged  behind. 

Operational  activities  of  the  U.S.  Army  (e.g.  munitions  manufacture) 
involve  the  production,  use,  and/or  discharge  of  a  variety  of  commercial 
chemicals.  Safety  data  must  be  provided  to  USEPA  concerning  surface 
water  contamination  due  to  discharges  of  chemical  intermediates  or  the 
final  product.  The  Army  conducts  both  intramural  and  extramural  programs 
of  aquatic  toxicity  testing  to  develop  such  data. 

Considerable  amounts  of  time,  money,  and  manpower  are  expended  by  the 
Army  in  such  aquatic  toxicity  testing  programs.  To  make  these  programs 
more  efficient  and  more  effective,  the  need  has  been  felt  for  a  reexamina¬ 
tion  of  some  of  the  standard  methods  used.  This  had  been  especially  true 
of  statistical  methods  involved  in  the  design  of  testing  programs  and  the 
analysis  of  resulting  data. 

This  report  represents  the  results  of  the  second  phase  of  a  study  of 
statistical  methods  in  aquatic  toxicology.  The  first  phase  pertained  to 
chronic  tests  with  fathead  minnows;  the  results  are  presented  in  Feder 
and  Collins  [1],  The  present  study  considers  statistical  aspects  of  the 
design  and  analysis  of  chronic  tests  with  Daphnia  magna . 

Many  of  the  statistical  methods  used  in  this  study  were  adapted  from 
those  used  by  Feder  and  Collins  [1]  in  their  analysis  of  data  from 
fathead  minnow  chronic  toxicity  tests.  Other  statistical  methods  dis¬ 
cussed  and/or  developed  in  this  study  do  not  appear  in  [1];  some  have 
not  been  previously  applied  to  aquatic  toxicity  data  and  some  have  been 
especially  developed  or  extended  for  this  study.  The  suggested  procedures 
are  illustrated  with  examples  based  on  various  Daphnia  chronic  toxicity 
tests.  The  data  used  for  illustration  reflect  a  number  of  possible 
variants  in  test  design. 

Topics  discussed  include  preliminary  and  residual  graphical  displays; 
preliminary  tests  of  beaker  to  beaker  heterogeneity  within  groups;  adjust¬ 
ments  to  account  for  such  heterogeneity;  outlier  detection  tests;  compari¬ 
sons  of  average  mortality,  length,  and  reproduction  levels  between  water 
control  and  solvent  control  groups  along  with  conceptual  implications  of 
discrepanies  that  might  be  found;  overall  tests  of  heterogeneity  in 
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response  levels  across  treatment  groups;  treatment  group-control  group 
pairwise  multiple  comparison  and  confidence  interval  procedures;  the 
fitting  of  dose  response  curve  models  to  mortality,  reproduction,  and 
length  responses;  point  and  confidence  interval  estimation,  based  on 
these  fits,  of  concentrations  associated  with  biologically  significant 
increases  or  decreases  in  response  levels  relative  to  the  controls; 
statistical  precision  to  be  expected  from  tests  as  a  function  of  sample 
sizes,  variability  of  responses,  and  extent  of  beaker  to  beaker  hetero¬ 
geneity  within  groups;  a  rationale  for  unequal  allocation  of  test  beakers 
within  treatment  groups,  with  greater  numbers  of  beakers  in  the  control 
group  and  lower  treatment  groups  and  lesser  numbers  of  beakers  in  the 
higher  treatment  groups;  analysis  of  time  trends  in  reproduction;  analysis 
of  time  to  death. 

It  is  hoped  that  the  results  obtained  in  this  study  will  contribute 
to  better,  more  reliable  toxicity  tests  and  data  analyses.  This  is  turn 
should  provide  improved  tools  for  the  regulation  of  toxic  chemicals  in 
aquatic  environments  and  should  suggest  fertile  areas  for  further  study 
and  development. 


.  CHRONIC  DAPHNID  TOXICITY  TESTS  -  BASIC  DATA 


The  data  that  we  will  be  concerned  with  pertain  to  21  day  chronic 
Daphnia  studies.  There  are  a  number  of  variations  across  laboratories  in 
the  way  the  tests  were  carried  out  and  the  types  of  responses  that  were 
recorded . 

Some  of  the  variations  in  technique  do  not  affect  statistical  analysis 
of  the  data,  although  they  are  very  important  from  a  biological  standpoint. 
Dr.  William  van  der  Schalie  has  compiled  a  collection  of  these  biological 
parameters,  which  we  indicate  below  in  Table  1.1. 

Other  test  parameters  strongly  affect  the  responses  that  can  be  measured 
and  the  kinds  of  data  analyses  that  can  be  carried  out.  Among  these  are: 


Number  of  replicate  beakers  (test  chambers)  per  treatment 
Number  of  daphnids  per  replicate  beaker  (test  chamber) 
individual  daphnids 
multiple  daphnids 
Randomization  procedures 

test  chamber  to  position 

random,  systematic,  blocking 
daphnids  to  test  chambers 

complete  randomization 
blocking  -  e.g.  on  broods 
Responses  measured  and  frequency  measured 

live/dead  -  measured  frequently  or  only  once  or  twice 
during  test 

numbers  of  offspring  produced  -  (total  offspring  or 
live  offspring) 
per  replicate 
per  female 
lengths 

Number  of  control  groups 
water  control 
solvent  or  carrier  control 


level 


measured 
frequently  or 
only  once  or 
twice  during 
test 


TABLE  1.1  BIOLOGICAL,  PHYSICAL,  AND  CHEMICAL  SUMMARY  PARAMETERS  OF  CHRONIC 
DAPHNIA  TOXICITY  TESTS  (COMPILED  BY  DR.  WILLIAM  VAN  DER  SCHALIE) 

Test  Data  Source: 

Toxicant  (Name  or  Code) : 

Daphnid  Chronic  Toxicity  Test  Data  Sheet 

Culture  Information 

Food  Composition: 

Frequency  of  Feeding: 

Water  Flow  (static  renewal,  or  flow-through): 

Indicate  frequency  of  renewal  or  flow  rate  in  tank  volumes/day: 

Age  of  Adults  used  to  obtain  test  Daphnids: 

Temperature  (average  and  range) : 

Photoperiod  and  Lighting  (quality  and  intensity) : 

Was  culture  water  used  the  same  as  for  the  test  dilution  water? 

If  not,  how  did  the  water  quality  differ: 

Test  Information  (If  same  as  for  brood  culture,  please  enter  "same") 

Food  Composition: 

Frequency  of  Feeding: 

Water  Flow  (static  renewal,  or  flow-through): 

Indicate  frequency  of  renewal  of  flow  rate  in  tank  volume/day: 

Age  of  Daphnids  at  start  of  test: 

Dilution  Water  Quality  (indicate  average  and  range  during  test,  if 
possible) : 

Source  (tap,  well,  river,  reconstituted,  etc): 
pH:  Conducitivity: 

Alkalinity:  Temperature: 

Hardness:  Other: 


Dissolved  oxygen: 


TABLE  1.1.  (Continued) 


Duration  of  Test: 

Type  of  Test  Container  and  Volume: 

Nominal  Concentrations: 

Dilution  Factor  Between  Concentrations : 

Duration  of  Test: 

Type  of  Test  Container  and  Volume: 

Number  of  Treatment  Levels: 

Number  of  Replicates  per  Treatment  Level: 

Number  of  Daphnids  per  Replicate: 

Describe  randomization  procedures  (daphnids  to  test  containers  and 
test  containers  to  position): 

Indicate  which  of  the  following  biological  endpoints  were  measured 
and  the  frequency  measurement: 

Su  rv i va 1 : 

Growth : 

Days  to  first  voting: 

Young  production: 

Total  voting  per  replicate: 

Yo  ,  in  pr  r  I  emu  1  e  : 

Young  per  female  per  reproduc t ive  day: 

Young  per  brood: 

Dead  or  aborted  young: 

Other : 

Control  responses  (indicate  to  21  or  28  days) 

Mortal i ty  (  ) : 

Young  per  female: 


Were  ephippia  formed: 


The  data  sets  we  have  received  represent  a  number  of  variations  on 
the  types  of  tests  that  are  run.  We  have  received  five  data  sets. 

1,  2.  Gerald  LeBlanc  -  EG&G  Bionomics  -  Compounds  "A"  and  "B" 

Flow-through  test  -  21  day  test 

water  control  group,  solvent  control  group,  5  treatment  groups 

4  replicate  test  chambers  per  group 
20  daphnids  per  beaker  to  start 
Responses  measured 

7,  14,  21  day  survival 

7,  14,  21  day  cumulative  offspring  per  surviving  female 
7,  21  day  lengths  (on  individual  daphnids) 

Concentrations  determined  in  two  of  four  test  chambers  on 
days  0,  7,  14,  21  -  Average  concentration  determinations 
used  for  analysis 

Bill  Adams  -  Selenium 

Static  renewal  test  -  21  day  test 
1  control  group,  7  treatment  groups 
3  replicate  test  chambers  per  group 

5  daphnids  per  beaker  to  start 
Responses  measured 

2,  4,  7,  9,  11,  14,  16,  18,  21  day  survival 

2,  4,  7,  9,  11,  14,  16,  18,  21  day  cumulative  offspring 

per  surviving  female 

no  length  data 

Averages  of  measured  concentrations  will  be  used  for  analysis 
Gary  Chapman  -  Beryllium 

Static  renewal  test  -  21  day  test 

water  control  group,  solvent  control  group,  6  treatment  groups 
10  replicate  test  chambers  per  group 
individual  daphnids  per  test  chamber 
Responses  measured 

3,  5,  7,  10,  12,  14,  17,  19,  21  day  survival 

3,  5,  7,  10,  12,  14,  17,  19,  21  day  #  offspring  (both  live 

and  dead)  and  It  broods 
21  day  lengths 

Averages  of  measured  concentrations  used  for  analysis 


Clyde  Goulden  -  Isophorone 

Static  renewal  test  -  21  day  test 
control  group,  5  treatment  groups 
10  test  chambers  per  group 

7  beakers  with  individual  daphnids  -  to  determine 
individual  production  figures 
3  beakers  with  5  daphnids  per  beaker  -  to  get  survival 
information 
Responses  measured 

2,  4,  6,  8,  11,  13,  15,  18,  21  day  survival  data 


2,  4,  6,  8,  11,  13,  15,  18,  21  day  #  offspring  (live) 
from  the  daphnids  in  the  beakers  with  just  one 
daphnid  per  beaker 
no  length  data 

Nominal  concentrations  used  for  analysis 
Note  on  Concentrations  Used  for  Analysis  Purposes 

The  various  control  groups  and  treatment  groups  correspond  to  nominal 
toxicant  concentrations.  As  part  of  good  experimental  practice,  periodic 
determinations  are  made  of  the  chemical  concentrations  in  the  various  beakers. 
The  variation  in  these  determinations  is  due  in  part  to  fluctuations  over 
time  in  toxicant  concentrations  within  the  beakers  and  in  part  to  analytical 
errors.  Furthermore  there  may  be  variation  in  concentration  levels  among 
test  chambers  within  groups  due  to  random  variations  in  the  delivery  system 
(e.g.  partially  blocked  tubes)  or  varying  amounts  of  settling  out  of 
toxicant  in  the  various  test  chambers.  However  chemical  determinations 
are  not  made  in  all  chambers  at  equal  intervals.  We  adopt  the  (somewhat 
arbitrary)  conventions  that: 

1.  If  no  chemical  determinations  are  made  during  the  experiment 
or  if  the  results  are  not  reported,  we  utilize  the  nominal 
concentrations  in  subsequent  statistical  analyses. 

2.  If  chemical  determinations  are  made  within  each  group  in 
representative  test  chambers  at  periodic  intervals,  then  we 
associate  the  average  of  all  these  determinations  with  the 
concentration  in  each  beaker  within  the  group.  That  is,  we 
utilize  a  single  toxicant  concentration  over  time  and 
across  beakers  within  groups. 

In  flow  through  tests,  theoretically  all  beakers  within  a  group 
receive  the  same  water  and  so  should  have  the  same  concen¬ 
trations.  In  static  tests,  this  may  not  be  so. 

The  problem  of  how  to  account  for  fluctuations  in  toxicant  concentrations 
in  the  statistical  analysis  is  an  interesting  one,  however  we  will  not 
pursue  it  here. 

Different  investigators  and  different  laboratories  report  their  data 
in  different  formats,  in  different  styles,  and  at  varying  intervals.  For 
example  LeBlanc  reports  survival  and  production  at  days  7,  14,  21  while 
Adams,  Chapman,  and  Goulden  report  them  at  more  frequent  intervals.  LeBlanc 
and  Chapman  report  lengths  while  Adams  and  Goulden  do  not.  Some  investiga¬ 
tors  report  cumulative  production  versus  time  while  other  investigators 
report  current  production  versus  time.  Some  investigators  report  measured 
toxicant  concentrations  while  others  report  only  nominal  concentrations. 

Recommendation:  A  standardized  data  reporting  format  should  be  adopted, 

analogous  to  that  discussed  for  the  fathead  minnow  tests  (Feder  and  Collins). 
A  good  start  in  that  direction  is  the  experimental  categorization  summary 
sheets  prepared  by  Dr.  William  van  der  Schalie,  which  is  shown  in  Table  1.1. 
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In  order  to  carry  out  statistical  analyses  and  data  displays,  the  data 
first  had  to  be  computerized.  As  the  data  were  reported  in  a  number  of 
different  formats,  the  data  first  needed  to  be  recoded  in  a  somewhat  uniform 
manner,  amenable  for  analysis.  The  approach  we  took  is  indicated  in  the 
figures  below,  which  show  computer  listings  of  the  various  data  sets. 

Figures  1.1  to  1.4  are  based  on  LeBlanc’s  tests  on  Compounds  A  and  B. 
Figures  1.1,  1.3  contain  7,  14,  21  day  survival  and  cumulative  productivity, 
and  day  0,  7,  14,  21  measured  toxicant  concentrations.  Note  that  the 
concentrations  given  in  the  third  field  are  nominal  concentrations.  Note 
also  that  since  only  two  concentration  determinations  per  group  were  made 
on  each  occasion,  the  "measured"  concentrations  on  each  day  in  each  group 
are  really  two  duplicates  rather  than  four  measured  values.  Figures  1.2, 

1.4  contain  seven  day  and  21  day  length  determinations  on  surviving  adults 
at  that  point.  (Note  the  21  day  length  in  beaker  7A  but  no  corresponding 
7  day  length.  This  value  should  be  associated  with  beaker  C.) 

Figure  1.5  pertains  to  Adams'  test  on  selenium.  The  concentrations 
indicated  are  nominal  concentrations.  Survival  data  is  given  corresponding 
to  days  2,  4,  7,  9,  11,  14,  16,  18,  21.  Cumulative  fertility  per  surviving 
adult  is  indicated  for  these  same  days. 

Figure  1.6  pertains  to  Chapman's  test  on  beryllium.  Concentrations 
indicated  are  averages  of  measured  concentrations  for  each  treatment  group. 
Since  just  one  daphnid  is  contained  in  each  test  chamber,  various  responses 
can  be  measured  that  are  not  feasible  with  multiple  daphnids  per  beaker, 
as  in  the  LeBlanc  or  Adams  tests.  Thus  time  to  death  of  each  daphnid  (50 
represents  a  censored  value — i.e.  survives  beyond  day  21),  number  of  broods, 
time  to  first  brood,  numbers  of  live  young  on  days  3,  5,  7,  10,  12,  14,  17, 
19,  21,  and  lengths  are  given.  Note  that  in  contrast  to  the  LeBlanc  and 
Adams  productivity  data,  these  productivity  values  represent  current 
rather  than  cumulative  values. 

Figures  1.7  and  1.8  pertain  to  Goulden's  test  on  isophorone.  The  concen¬ 
trations  indicated  are  nominal  concentrations.  Figure  1.7  contains  survival 
data  for  each  of  the  test  chambers  on  days  0,  2,  4,  6,  8,  11,  13,  15,  18, 

21.  Figure  1.8  contains  production  data  for  the  daphnids  in  beakers  1-7 
within  each  group  (individual  daphnids  per  beaker).  Time  to  death  (50 
represents  a  censored  value),  number  of  broods,  time  to  first  brood,  and 
numbers  of  live  young  on  days  2,  4,  6,  8,  11,  13,  15,  18,  21  are  also 
given.  As  with  the  Chapman  data,  the  production  figures  represent 
current  rather  than  cumulative  values. 


INVESTIGATOR*  £G  ANO  G  BIONOMICS  —  TEST  A 

?1  OAV  PLOW  TUB  PUGH.  — HIXH  OAPHNIA  MAGNA - 

1  CONTROL  GROUP  (H  »  1  SOLVENT  GROUP  (2),  5  TRX  GROUPS  (3-7) 


PtiTcONC  <MC/L)  NO.  TESTEO.  NO.  LIVE  (7.14*21).  CUM.  PERT.  (7.14*21). 
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INVESTIGATOR!  EG  ANO  G  91 ONOM IC S--T EST  0 

21  DATA  FLOW  IHROiiStL- JE S T  MXTJL  QAPH NIA_ MAGMA _  _ _ 

1  CONTROL  GROUP  (1)*  1  SOLVENT  GROUP  (2),  5  TRX  GROUPS  (3-7) 

4  BEAKERS  PER  GROUP  -  F  OR  HAT  I  ( II .  1  X .  A±«  IX. Fa. 4 .4I5.3FS.1.4(A1«F7«5) _ 

GRP,  REPL,ftONC  (HG/L)  NO.  TESTED ,  NO.  LIVE  ( 7, 14, 2 1) , CUN.  FERT.  (7,14,21,). 

MEASURED,  CONC  UU2»4**244 - - - 

1  A  0.0000  20  20  20  19  0.0  27.0  79.0  -.99999<0. 000 10  -. 99999<0 . 00020 

-  l_fl-O*9O40 - 20 - 19 - 19— -10-  0.8  39.0 132.0  -.99999*0.00010  -♦ 99999 <4.44420 — 

1  C  0.0000  20  20  20  20  0.0  4 1. 0 1 27 . 0< 0 . 0 0 0 10  -. 99999<0. 00 060  -.99999 

— 1  n  4WL444 - 20 - 20 - IS - IS - Q-.0_S.W0  lOfUQ  <0.000  10  =-«-9a.999<Q-  000 SO-  -  .  99-9-99 — 

2  A  0.0000  20  19  Id  17  0.0  33.0106.0  - . 99999<Q . 000 10  -. 99999< 0 . 00020 

2  B  0.0  000 - 20 - 10 - 10 - 10 - 0.0  42.0  119.  0-^  .-99999*9.  Qfl010^.99999<0-.99029- 

2  C  0.0  000  20  19  Id  16  0.0  2S.  0  83. 0<0. 00010  -.  99999<0. 00 060  -.99999 

2  C  O. 0  000  20 - 19 - -49 - 10 - 4.4-  3  WO  123.0  <-0-.  000-10-  99999*4.93960  -..49949- 

3  A  0.0120  20  20  20  20  0.0  42.0169.0  -.99999  0.01300  -.99999  0.01S00 

— 3_. ft  -0-.-fl-l2.fl - 20 - 19 - 19 - 18 - a-.0_4.l-0  112-J)--=-..999  99  0-.  01200  -.  99999  0.01900 — 

3  C  0.0120  20  20  19  20  .  0.0  26.0  77.0  0.01100  -.99999  0.02400  -.99999 

3-0  0*0  120  20 - 14 - 14 - 14 - 0.0  43.9-106.4  0.01200  -.99999  0.01400  -.99999 

4  A  0.0250  20  20  20  17  0.0  57.0155.0  -.99999  0.02800  -.99999  0.02200 

-  -4  a  0.0  250  _2fl - 20 _ 20 _ 20 - 0.4-32.0139.0  -  .  99 9  99  0.43100  -.9-9493  -4.02230 

4  C  0.0250  20  20  20  20  0.0  41.0117.0  0.04000  -.99999  0.02800  -.99999 

4  Q  0.0250 _ 20 _ 20 _ L9 _ 14 _ 0.3  39.ni?7.n  n.npsnn  -.qqqgq  n.m?nn  -.<39993 
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Figure  1.5.  Adams  -  Selenium  -  21  day  survival  and  production  data 
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II.  PRELIMINARY  GRAPHICAL  DATA  DISPLAYS 


Graphing  the  data  is  generally  considered  to  be  a  good  first  step  in 
statistical  analysis.  Graphs  provide  insights  into  the  structure  of  the 
data  and  reveal  the  presence  of  possibly  unanticipated  relations  or  anomolies. 
This  section  contains  preliminary  displays  of  survival,  production,  and 
length  responses  versus  group  and  versus  concentration. 

Figures  II. 1  to  II. 10  display  trends  in  21  day  mortality  (or  survival) 
with  group  and  with  concentration  for  the  LeBlanc,  Adams,  Chapman,  and 
Goulden  tests. 

Figures  II. 1  to  II. 4a  pertain  to  LeBlanc's  tests  with  Compounds  A  and 
B  respectively.  These  tests  consist  of  a  water  control  group,  a  solvent 
control  group,  and  five  treatment  groups,  with  four  replicate  test  chambers 
per  group.  Figures  II. 1,  II. 2,  and  II. 2a  show  both  a  higher  mortality  level 
and  a  higher  average  measured  concentration  in  the  water  control  group  than 
in  the  solvent  control  group  for  test  A  (15  percent  versus  3.75  percent 
mortality  and  0.0010  mg/2  versus  0.0006  mg/2,  concentration  respectively). 

There  is  no  trend  in  mortality  among  the  first  three  treatment  groups  (up 
to  nominal  concentration  0.0120  mg/2  and  then  a  rather  rapid  rise  as  the 
nominal  concentration  level  is  doubled  and  then  redoubled  from  that  point. 

No  outlying  observations  are  evident,  however  there  is  some  question  of  the 
homogeneity  of  responses  across  replicates  in  group  6.  Figures  II. 3,  II. 4, 
and  II.4aa  show  virtually  no  trend  in  mortality  with  increasing  concentra¬ 
tion  in  test  B,  up  to  group  6.  There  is  then  a  substantial  increase  in 
mortality  between  group  6  and  group  7.  No  outlying  results  are  evident, 
but  there  is  question  about  homogeneity  of  responses  across  replicate  test 
chambers  in  group  7. 

Figures  II. 5  and  II. 6  pertain  to  Adams'  test  with  selenium.  This  test 
consists  of  a  control  group  and  seven  treatment  groups  with  three  replicate 
test  chambers  per  group.  There  appears  to  be  no  trend  in  mortality  (and 
very  little  mortality)  in  groups  1  to  4.  There  is  then  a  sudden  jump  in 
mortality  between  groups  4  and  5  and  mortality  remains  at  100  percent 
thereafter.  No  outlying  responses  are  evident.  We  conclude  from  these 
graphs  that  the  dose  response  relation  is  very  steep. 

Figure  II. 7  pertains  to  Chapman's  test  with  beryllium.  The  test  consists 
of  water  and  solvent  control  groups  and  six  treatment  groups.  There  are 
individual  daphnids  in  each  beaker.  The  mortality  rates  plotted  represent 
average  mortality  over  the  ten  beakers  within  each  group.  There  appears 
to  be  no  systematic  trend  in  mortality  with  increasing  concentration  nor 
does  there  appear  to  be  any  outlying  responses. 
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Figures  II. 8,  II. 9  and  II. 10  pertain  to  Goulden's  test  with  isophorone. 

The  test  consists  of  a  control  group  and  five  treatment  groups.  Seven  of 
the  ten  beakers  per  group  contain  individual  daphnids  while  the  other  three 
beakers  per  group  contain  multiple  daphnids.  Note  that  survival  is  plotted 
in  these  figures  rather  than  mortality.  For  each  group  or  concentration, 
the  average  survival  in  the  seven  individual  beakers  along  with  individual 
survival  in  the  remaining  three  beakers  is  plotted.  It  is  seen  that  in  four 
of  the  six  groups,  individual  survival  exceeds  multiple  survival.  There 
appears  to  be  a  downward  trend  in  the  survival  rates  associated  with  the 
beakers  containing  multiple  daphnids.  However  for  the  individually  housed 
daphnids  there  is  no  trend  in  survival  rates.  They  remain  at  about  100  percent 
in  the  first  five  groups  and  then  rapidly  jump  down  to  about  15  percent. 

Thus  these  plots  suggest  differences  in  mortality  rates  between  the  in¬ 
dividually  and  multiply  housed  daphnids.  There  is  also  a  suggestion  of  a 
possibly  outlying  response  in  group  5.  This  needs  to  he  investigated  further. 

Figures  II. 11  to  11.25  display  trends  in  21  day  production  vs  group  and 
vs  concentration  for  the  LeBlanc,  Adams,  Chapman,  and  Goulden  tests.  For 
the  LeBlanc  and  Adams  tests  (with  multiple  daphnids  per  beaker)  the  fertility 
measure  used  is  cumulative  production  per  surviving  adult  per  beaker.  For 
the  Chapman  and  Goulden  tests  (with  individual  daphnids  per  beaker)  the 
fertility  measure  used  is  cumulative  production  for  each  individual  daphnid 
that  survived  to  the  end  of  the  test. 

Figures  II. 11  to  11.14  pertain  to  LeBlanc 's  tests  with  Compounds  A  and 
B.  Figures  II. 11  and  11.12  show  an  increase  in  productivity  of  the  solvent 
control  group  in  test  A  as  compared  to  the  water  control  group.  There  is 
then  a  decrease  in  production  with  increasing  concentration.  The  highest 
treatment  group  has  almost  complete  mortality  (just  one  survivor  of  80 
daphnids  that  began  the  test)  and  no  offspring.  Figures  11.13  and  11.14 
show  first  (among  the  treatment  groups)  an  increase  in  production  with 
increasing  concentration  and  then  a  decrease.  One  of  the  values  in  group  3 
appears  to  be  a  possible  outlier.  There  does  not  appear  to  be  hetero¬ 
geneity  of  variance  with  increasing  mean  level. 

Figures  11.15  and  11.16  pertain  to  Adams'  test  with  selenium.  We  pre¬ 
viously  observed  100  percent  mortality  in  groups  5  through  8  and  we  now  see 
that  there  is  no  production  in  these  groups  either.  (Virtually  all  the 
deaths  in  these  groups  occurred  by  day  4,  well  before  any  of  the  daphnids 
in  the  test  were  mature  enough  to  produce  offspring.)  There  appears  to  be 
a  downward  trend  in  average  production  with  increasing  concentrations  in 
groups  1  to  4.  There  is  also  a  suggestion  of  increase  in  variability. 

Figures  11.17  to  11.20  pertain  to  Chapman's  test  with  beryllium. 

Figure  IT. 17  displays  total  young  produced  by  each  adult  surviving  to 
the  end  of  the  test.  The  water  control  and  solvent  control  groups  appear 
to  be  comparable  and  higher  on  average  than  the  six  treatment  groups.  There 
is  no  suggestion  of  heterogeneity  of  variance  or  of  outlying  values.  There 
is  no  apparent  trend  in  production  levels  among  the  six  treatment  groups. 
Figures  11.18  and  11.19  display  average  total  production  for  surviving 


adults  versus  group  number  and  versus  log^Q  (concentration).  In  both  plots 
we  see  that  the  control  groups'  production  levels  are  much  higher  than  the 
treatment  group  production  levels,  the  solvent  control  group  has  higher 
average  production  level  than  the  water  control  group,  and  the  treatment 
group  average  production  levels  show  no  trend  with  increasing  concentration. 
Figure  11.20  shows  the  standard  deviation  of  21  day  production  levels  of 
survivors  plotted  against  average  production  levels.  There  is  an  increasing 
relationship  between  standard  deviation  and  average  level. 

Figures  11.21  to  11.25  pertain  to  Goulden's  test  with  isophorone. 

Figure  11.21  displays  individual  21  day  production  for  all  daphnids,  both 
survivors  and  nonsurvivors.  Recall  that  production  data  were  recorded 
only  on  the  seven  individually  housed  daphnids  per  group.  We  see  a 
definite  quadratic  trend  in  the  plot,  first  increasing  and  then  decreasing. 
Figures  11.22,  11.23,  and  11.24  display  21  day  production  for  all  in¬ 
dividually  housed  daphnids  that  survived  to  the  end  of  the  tests.  Figure 
11.22  is  nearly  identical  in  appearance  to  Figure  11.21  since  there  was  just 
one  death  among  individually  housed  daphnids  in  groups  1  to  5  and  in  group  6 
there  was  no  production  among  any  of  the  seven  daphnids,  whether  or  not 
they  survived.  Figure  11.25  displays  standard  deviation  of  production 
versus  average  production  for  the  surviving  daphnids.  No  trend  is  evident. 
Note  that  group  6  is  not  represented  in  this  plot  since  there  was  just  one 
surviving  daphnid  and  so  the  standard  deviation  was  not  calculated. 

Figures  11.26  to  11.34  display  trends  in  average  21  day  lengths  and  in 
variation  within  beakers  and  among  beakers  within  groups  for  the  LeBlanc 
and  Chapman  tests.  Adams  and  Goulden  do  not  report  length  data. 

Figures  11.26,  11.27,  and  11.28  pertain  to  LeBlanc's  test  with  Compound 

A.  Figure  11.26  shows  a  higher  average  length  among  survivors  in  the 
solvent  control  group  than  in  the  water  control  group.  No  trend  in  either 
average  length  or  variability  of  length  is  evident  in  groups  3  to  6.  The 
point  in  group  7  corresponds  to  just  a  single  daphnid,  the  only  survivor 

in  that  group,  and  is  relatively  stunted.  Figure  11.27  displays  the  within 
beaker  standard  deviations  of  length  versus  the  within  beaker  average 
lengths,  A  negative  trend  can  be  seen,  indicating  that  variability 
decreases  with  increasing  length!  This  is  opposite  to  what  occurs  with 
most  physical  phenomena.  Figure  11.28  displays  standard  deviations 
among  beaker  averages  within  groups  versus  means  of  beaker  averages. 

A  clear,  strong  negative  trend  is  evident;  that  is,  decreasing  varia¬ 
bility  with  increasing  length. 

Figures  11.29,  11.30,  and  11.31  pertain  to  LeBlanc's  test  with  Compound 

B.  Figure  11.29  shows  a  lower  average  length  among  survivors  in  the  solvent 
control  group  than  in  the  water  control  group.  This  is  opposite  to  what 
was  observed  for  the  Compound  A  test.  This  suggests  that  no  universal 
relation  is  appropriate.  There  is  little  trend  in  either  average  length 

or  in  variability  of  length  in  groups  3  to  6.  The  lengths  in  group  7  are 
clearly  stunted.  Figure  11.30  displays  the  within  beaker  standard  devia¬ 
tions  of  length  versus  within  beaker  averages.  There  is  little  trend  in 
the  plot.  If  any  exists,  it  is  slightly  negative.  Figure  11.31  displays 


19 


the  standard  deviation  among  beaker  averages  within  groups  versus  mean  of 
beaker  averages.  A  negative  trend  is  evident,  although  not  as  strong  as 
for  Compound  A.  The  point  to  the  far  left  of  the  plot  corresponds  to 
group  7,  which  is  not  typical  of  the  other  groups.  However  a  negative 
trend  exists  even  without  this  point. 

Figures  11.32,  11.33,  and  11.34  pertain  to  Chapman's  test  with  beryllium 
Figures  11.32  and  11.33  show  about  the  same  average  lengths  in  the  solvent 
control  and  in  the  water  control  groups  and  a  decreasing  trend  in  average 
length  with  increasing  concentration.  The  variability  among  lengths 
decreases  as  the  average  among  lengths  increases.  This  is  the  same  phenom¬ 
enon  that  we  saw  with  Compound  A.  Figure  11.34  displays  the  standard 
deviation  among  lengths  within  groups  versus  average  legnth.  There  is  a 
clear  negative  trend. 

We  thus  conclude  that  the  variability  of  daphnid  lengths  decreases  as 
average  length  increases.  This  type  of  phenomenon  would  be  compatible 
with  a  biological  upper  limit  on  daphnid  length  that  the  adult  daphnids 
are  approaching.  The  distribution  of  daphnid  lengths  would  then  be 
skewed  to  the  left. 


Figure  II. 1.  LeBlanc  -  Test  A.  21  day  mortality  by  treatment  group 
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Figure  II. 2a.  LeBlanc  -  Test  A.  21  day  mortality  by  concentration 


Figure  II. 3.  LeBlanc  -  Test  A.  21  day  mortality  by  treatment  group 
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Figure  II. 6.  Adams  -  Selenium.  21  day  mortality  by  log  n  (concentration) 
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Figure  II. 8.  Goulden  -  Isophorone.  21  day  survival  by  group 
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Figure  II. 9.  Goulden  -  Isophorone.  21  day  survival  by  log  (1  +  concentration) 


SC*rTFKG*M<  J*  (HU-4)  PS. MV 
(  AC*  »ss  )  C  1MC 


Figure  II. 10.  Goulden  -  Isophorone.  21  day  survival  by  concentration 
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Figure  IT. 11.  LeBlanc  -  Test  A.  21  day  cumulative  fertility  per  surviving  adult  by  group 
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Figure  11.14.  LeBlanc  -  Test  B.  21  day  cumulative  fertility  per  surviving  adult  by  login  (concentration) 


Figure  11.15.  Adams  -  Selenium.  21  day  cumulative  fertility  per  surviving  adult  by  group 


Figure  11.16.  Adams  -  Selenium.  21  day  cumulative  fertility  per  surviving  adult  by  log-.~  (concentration) 


Figure  11.17.  Chapman  -  Beryllium.  Individual  21  day  production  for  surviving  adults  by  group 
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Figure  11.19.  Chapman  -  Beryllium.  Average  21  day  production  per  surviving  adult  by  log10  (concentration) 
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Figure  11.20.  Chapman  -  Beryllium.  Standard  deviation  of  21  day  production  for 
surviving  adults  by  average  production 


Figure  11.21.  Goulden  -  Isophorone.  Individual  21  day  production  for  both 
surviving  and  nonsurviving  adults  by  group 


Figure  11.22.  Goulden  -  Isophorone.  Individual  21  day  production  for  surviving  adults  by  group 
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Figure  11,25.  Goulden  -  Isophorone.  Standard  deviation  of  21  day  production 
for  surviving  adults  by  average  production 
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Figure  11.28.  LeBlanc  -  Test  A.  Standard  deviations  of  average  21  day  lengths  among  beakers 
within  groups  by  means  of  beaker  averages  within  groups  (note  that  group  7 
is  omitted  since  just  one  beaker  had  a  survivor) 


Figure  11.30.  LeBlanc  -  Test  B,  Standard  deviations  of  21  day  lengths  of  survivors 
within  beakers  by  mean  lengths  within  beakers 


Figure  11.31.  LeBlanc  -  Test  B.  Standard  deviations  of  21  day  lengths  among  beakers 
within  groups  by  means  of  beaker  averages  within  groups 


Figure  11.33.  Chapman  -  Beryllium.  21  day  lengths  of  survivors  within  beakers  by  lo 
(concentration).  M's  correspond  to  group  means. 
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III.  TESTING  FOR  BEAKER  TO  BEAKER  HETEROGENEITY 
WITHIN  TREATMENT  GROUPS  -  SURVIVAL  DATA 


A.  BACKGROUND 

Toxicity  tests  with  Daphnia  magna  generally  include  several  replicate 
beakers  per  treatment  or  control  group  in  order  to  be  able  to  assess 
variability  of  response.  An  important  preliminary  inference  is  to  deter¬ 
mine  if  there  is  any  statistical  evidence  of  variation  in  response  rate 
across  beakers  within  groups.  For  survival  data,  it  is  of  interest  to 
determine  whether  the  variation  in  mortality  rates  across  beakers  within 
groups  is  compatible  with  that  to  be  expected  based  on  binomial  theory  or 
whether  it  is  in  excess  of  that.  In  the  former  case  the  data  may  be  pooled 
across  beakers  and  subsequent  analyses  can  be  carried  out  based  on  binomial 
theory  (i.e.  on  a  per  daphnid  basis).  However  if  beaker  to  beaker  variation 
exists  then  standard  errors  based  on  binomial  theory  will  underestimate  the 
true  variability  of  responses.  This  would  lead  to  hypothesis  tests  that 
falsely  reject  the  null  hypothesis  more  often  than  the  nominal  level  (i.e. 
inflated  type  1  error),  confidence  intervals  that  are  too  short  (i.e. 
attained  confidence  level  lower  than  nominal) ,  and  simultaneous  inference 
procedures  arriving  at  no  effect  levels  that  are  too  low. 

To  account  for  possible  beaker  to  beaker  heterogeneity,  variability 
estimates  are  generally  based  on  sample  variances  of  the  observed  survival 
rates  in  the  replicate  beakers  with  groups.  Such  variability  estimates 
are  appropriate  whether  or  not  beaker  to  beaker  variation  exists,  however 
they  are  based  on  relatively  few  degrees  of  freedom  and  so  reduce  the 
sensitivity  of  subsequent  analyses  in  the  event  that  there  is  in  fact  no 
extra  binomial  variation.  There  is  thus  a  tradeoff  between  possible  under¬ 
estimation  of  variability  on  the  one  hand  and  possible  loss  of  sensitivity 
on  the  other.  A  reasonable  compromise  procedure  is  to  first  carry  out  a 
test  for  beaker  to  beaker  heterogeneity  among  beakers  within  groups.  If 
the  test  accepts  the  null  hypothesis  of  no  heterogeneity,  we  pool  data 
across  beakers  and  base  subsequent  analyses  on  binomial  theory  (i.e.  per 
daphnid  analysis).  If  the  test  rejects  the  null  hypothesis  of  no  hetero¬ 
geneity,  we  either  base  subsequent  analyses  on  sample  response  rates  within 
each  beaker  (i.e.  per  beaker  analyses)  as  is  usually  done  or  else  adjust 
the  sample  sizes  downward  to  effective  sample  sizes  and  then  pool  the 
adjusted  data  across  beakers.  The  latter  approach  has  been  described 
and  illustrated  for  fathead  minnow  data  in  Feder  and  Collins  [1], 

Sections  VII,  VIII  and  IX  and  will  be  further  illustrated  in  this  report 
for  Daphnia  magna  data.  In  this  section  we  confine  attention  to  testing 
for  heterogeneity  among  beakers.  We  discuss  adjustment  procedures  in 
subsequent  sections. 

Feder  and  Collins  [1]  Sections  VII  and  VIII  remark  that  problems 
can  arise  with  some  of  the  "standard"  procedures  for  testing  for  beaker  to 
beaker  variation  within  groups.  These  tests  are  based  on  asymptotic  theory 
and  sometimes  are  based  on  a  specific  form  of  dose  response  model.  There 
are  two  possible  types  of  difficulties  with  the  usual  asymptotic  chi  square 


tests  of  goodness  of  fit.  First,  the  weights  used  in  the  denominator  of 
the  chi  square  statistic  are  inappropriate  if  the  assumed  form  of  dose 
response  model  is  inappropriate.  This  may  bias  results.  For  example  a 
chi  square  test  statistic  for  heterogeneity  among  beakers  within  groups 
based  on  a  model  of  constant  mortality  rates  across  treatment  groups  has 
constant  probability  weights  across  groups  in  the  denominator  and  thus 
would  be  inappropriate  if  in  fact  there  is  a  trend  in  mortality  rate 
with  increasing  group  number.  Similarly  a  chi  square  test  statistic  for 
heterogeneity  based  on  a  probit  model  has  probit  based  weights  in  the 
denominator  and  so  would  be  inappropriate  if  the  probit  model  is  inappro¬ 
priate,  etc. 

A  second  possible  problem  with  the  "standard"  heterogeneity  tests  results 
from  their  asymptotic  nature.  The  validity  of  the  asymptotic  chi  square 
theory  on  which  they  are  based  is  dependent  on  expected  response  frequencies 
being  large  enough.  If  just  a  single  response  is  observed  in  a  group  with 
very  small  expected  frequency,  the  contribution  of  that  group  to  the  overall 
chi  square  value  can  be  dominant  and  can  strongly  bias  the  resulting  test 
of  heterogeneity  among  beakers.  This  situation  was  demonstrated  by  Feder 
and  Collins,  Tables  VII. 1  and  VII. 2.  Thus  the  relatively  small  sample 
sizes  coupled  with  response  rates  close  to  0  or  1,  that  are  fairly 
common  in  aquatic  toxicity  tests,  result  in  small  expected  frequencies 
and  therefore  often  invalidate  the  assumptions  underlying  heterogeneity 
tests  based  on  asymptotic  theory 

To  account  for  these  two  problems,  Feder  and  Collins  carry  out  separate 
chi  square  heterogeneity  tests  within  each  concentration  group  without 
imposing  any  structure  on  the  form  of  the  concentration-response  relation. 

The  test  results  are  then  pooled  across  groups  to  result  in  an  overall  test. 
The  tests  are  based  either  on  asymptotic  theory  or  on  exact,  small  sample 
theory  depending  on  whether  the  expected  response  frequencies  within 
each  cell  are  large  or  small.  The  convention  we  have  used  has  been  to  use 
heterogeneity  tests  based  on  exact,  small  sample  theory  if  any  expected 
response  frequencies  are  less  than  5.  A  computer  program,  EXAX2,  has  been 
developed  to  carry  out  tests  of  heterogeneity  of  mortality  rates  among 
beakers  within  groups,  based  on  exact,  small  sample  theory.  This  program 
is  described  in  detail  and  illustrated  with  data  based  on  fathead  minnow 
tests  in  Feder  and  Collins,  Section  VIII  and  in  Feder  and  Willavize 
[2].  See  those  reports  for  details  of  the  program.  In  this  section  we 
illustrate  the  use  of  EXAX2  on  several  sets  of  data  from  toxicity  tests 
with  Daphnia  magna  to  test  for  beaker  to  beaker  heterogeneity  in  mortality 
response  rates  within  groups. 

B.  APPLICATION  OF  EXAX2  TO  TEST  FOR  HETEROGENEITY  AMONG  BEAKERS 

WITHIN  GROUPS  WITH  DAPHNIA  MAGNA  DATA 

In  this  subsection  we  illustrate  the  results  of  EXAX2  comparisons  of 
21  day  mortality  rates  among  replicate  beakers  within  groups  using  data 
from  a  number  of  toxicity  tests.  The  EXAX2  outputs  are  shown  in  the 
referenced  figures.  The  observed  and  expected  cell  frequencies  are  indi¬ 
cated.  If  any  of  the  expected  cell  frequencies  are  lower  than  the  (user 
specified)  cutoff  of  5,  exact  distribution  theory  is  used  for  the  compari¬ 
son  in  that  group.  The  exact  distribution  of  the  chi  square  statistic. 
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conditional  on  the  marginal  totals,  is  enumerated  and  displayed.  The 
observed  value  of  chi  square,  the  observed  significance  level  A. 

-2-£n  A-^,  E(-2-£n  A^) ,  Var(-2£n  A^)  are  calculated.  The  separate  independent 
tests  for  each  treatment  or  control  group  are  combined  by  summing  -2£n  A^, 
E(-2£n  A^) ,  and  Var(-2£n  Ai)  over  groups  and  calculating  the  standardized 
test  statistic,  Z,  which  is  then  compared  to  a  standard  normal  distribution. 

We  now  illustrate  this  procedure. 

LeBlanc  Test  A  21  day  Mortality 

There  are  a  water  control  group,  a  solvent  control  group,  and  five  treat¬ 
ment  groups.  There  are  four  replicate  test  chambers  per  treat  group,  20 
daphnids  per  chamber  to  start.  The  results  from  the  EXAX2  calculations 
are  shown  in  Figures  III.l  to  III. 7  and  are  summarized  below.  The  mortality 
results  are  displayed  graphically  in  Figure  II. 1.  It  is  obvious  from 
Figure  II. 1  that  there  is  great  heterogeneity  of  mortality  rates  in  group  6 
and  that  group  3  possibly  exhibits  some  heterogeneity.  Other  than  that, 
the  results  appear  to  be  homogeneous  across  beakers  within  groups.  These 
conclusions  from  Figure  II. 1  are  supported  by  the  results  in  the  EXAX2 
output.  Figures  III.l,  III. 2,  III. 4,  III. 5  and  III. 7  show  no  significant 
heterogeneity  of  mortality  across  beakers  within  groups  1,  2,  4,  5  and  7 
respectively.  Figure  III. 3  shows  marginal  evidence  of  heterogeneity 
across  beakers,  with  beakers  3  and  4  having  somewhat  greater  mortality 
than  beakers  1  and  2.  Figure  III. 6  shows  substantial  heterogeneity  of 
response  across  beakers.  Beakers  3  and  4  have  better  than  75  percent 
mortality  while  beakers  1  and  2  have  less  than  20  percent  mortality.  This 
is  of  course  both  highly  statistically  and  highly  biologically  significant. 
The  pooled  significance  level  calculations  are  presented  below  the  results 
for  group  7.  Z  is  highly  statistically  significant  since  the  probability 
of  a  standard  normal  deviate  exceeding  5.553  by  chance  is  essentially  0. 

LE  BLANC  TEST  A  21  DAY  MORTALITY 


Trt 

Method 

Chi  Sq 

Ai 

-2£n  A^ 

E(-2£n  Ai) 

Var(-2£n  Ai) 

1 

exact 

0.0000 

1.0000 

0.0000 

1.6609 

3.6367 

2 

exact 

3.8095 

0.6105 

0.9869 

0.8687 

1.6419 

3 

exact 

7.7714 

0.0757 

5.1619 

1.5750 

3.4893 

4 

exact 

4.2270 

0.3445 

2.1312 

1.5068 

3.1226 

5 

exact 

5.4795 

0.1975 

3.2436 

1.5068 

3.1226 

6 

asymptotic 

41.2000 

0.0000 

37.8863 

2.0000 

4.0000 

7 

exact 

3.0380 

1.0000 

0.0000 

0.0000 

0.0000 

-  J 


l  -2ln  At  =  49.410 
Z  =  5.553 


E  E(-2£n  Ai)  =  9.1183 


£  Var  (~2ln  Ai)  =  19.0129 


The  highly  significant  pooled  result  is  due  to  group  6.  Without  this  group, 
the  value  of  Z  would  be  just  1.00.  We  thus  conclude  that  there  is  overall 
strong  statistical  evidence  of  beaker  to  beaker  heterogeneity  within  groups 
and  this  is  due  primarily  to  the  strong  dichotomy  of  mortality  results  in 
group  6. 

It  is  interesting  to  note  that  in  both  group  3  and  group  6,  where 
dichotomies  in  mortality  exist,  beakers  numbered  3  and  4  exhibit  somewhat 
greater  mortality  rates  than  beakers  numbered  1  and  2.  One  cannot  help 
but  wonder  whether  this  is  a  coincidence  or  whether  there  is  some  systematic 
difference  between  beakers  having  different  numbers.  This  could  be  related 
to  source  of  test  daphnids,  placement  or  handling  of  beakers,  connections 
to  proportional  diluter,  or  some  other  factors  associated  with  experimental 
technique.  These  issues  should  be  discussed  in  detail  with  the  investigator 
and  any  systematic  effects  should  be  taken  into  account  in  subsequent 
analyses  and  interpretations.  However  we  leave  this  issue  here  and  do  not 
pursue  it  in  this  study. 

LeBlanc  Test  B  21  day  Mortality 

The  set  up  of  the  test  is  the  same  as  that  for  Test  A,  namely  water  and 
solvent  control  groups,  five  treatment  groups,  four  beakers  per  group,  20 
daphnids  per  beaker  to  start.  The  results  from  the  EXAX2  calculations 
are  summarized  below.  The  mortality  results  are  displayed  graphically 
in  Figure  II. 3.  The  pattern  of  results  is  remarkably  like  that  in  Test  A. 

It  is  obvious  from  Figure  II. 3  that  group  7  exhibits  the  same  kind  of 
substantial  dichotomy  of  mortality  rates  that  group  6  exhibited  in  Test  A. 
Some  of  the  other  groups  show  some  suggestions  of  heterogeneity. 

LE  BLANC  TEST  B  21  DAY  MORTALITY 


Trt 

Method 

Chi  sq 

Ai 

-2-£n  Ai 

E(-2£n  Ai) 

Var(-2£n  Ai) 

1 

exact 

7.7778 

0.0711 

5.2879 

1.5361 

3.4424 

2 

exact 

1.0103 

0.9089 

0.1911 

1.7273 

3.6654 

3 

exact 

4.2105 

0.3222 

2.2652 

1.1245 

2.3815 

4 

exact 

5.7600 

0.1591 

3.6768 

1.3590 

2.6405 

5 

exact 

1.7451 

0.7153 

0.6702 

1.7273 

3.6654 

6 

exact 

4.4444 

0.2521 

2.7562 

1.5361 

3.4424 

7 

asymptotic 

26.6667 

0.0000 

23.7637 

2.0000 

4.0000 

Nl  M 

II  1 

2ln  Ai  =  38.6110 
3.9863 

E  E(- 

2ln  Ai)  = 

11.0104 

E  Var(-2£n 

Ai)  =  23.2375 

60 


The  pooled  significance  level  calculations  are  presented  below  those  for 
group  7.  Z  is  highly  statistically  significant  since  the  probability  of 
a  standard  normal  deviate  exceeding  3.9863  by  chance  is  about  0.0001.  The 
highly  significant  pooled  result  is  due  to  group  7.  Without  this  group 
the  value  of  Z  would  be  just  1.165.  We  thus  conclude  that  there  is  overall 
strong  statistical  evidence  of  beaker  to  beaker  heterogeneity  within  groups 
and  this  is  due  primarily  to  the  strong  dichotomy  of  mortality  results  in 
group  7 . 

It  is  curious  that  the  patterns  of  beaker  to  beaker  heterogeneity  are 
so  similar  in  Tests  A  and  B.  Each  test  has  one  group  with  a  very  sub¬ 
stantial  dichotomy  of  mortality  rates,  while  the  other  groups  demonstrate 
little  or  no  heterogeneity. 

Adams-Selenium  21  Day  Mortality 

This  test  is  a  static  renewal  test.  There  are  a  control  group  and  seven 
treatment  groups.  There  are  three  replicate  beakers  per  group  and  five 
daphnids  per  group  to  start.  The  results  from  the  EXAX2  calculations  are 
summarized  below.  The  mortality  results  are  displayed  in  Figure  II. 5.  VJe 
see  that  in  each  group  the.e  is  nearly  no  mortality  or  complete  mortality. 
The  middle  portion  of  the  dose  response  curve  undoubtedly  lies  between  the 
concentrations  in  groups  4  and  5.  There  is  of  course  no  suggestion  of 
heterogeneity  within  groups. 

ADAMS-SELENIUM  21  DAY  MORTALITY 


Trt 

Method 

Chi  Sq 

H 

-2£n  Ai 

E(-2ln  Ai) 

Var(-2£n  Ai) 

1 

exact 

2.14286 

1.000 

0.0000 

0.0000 

0.0000 

2 

table  degenerate 

1.000 

0.0000 

0.0000 

0.0000 

3 

exact 

2.14286 

1.000 

0.0000 

C.0000 

0.0000 

4 

table  degenerate 

1.000 

0.0000 

0.0000 

0.0000 

5 

table  degenerate 

1.000 

0.0000 

0.0000 

0.0000 

6 

table  degenerate 

1.000 

0.0000 

0.0000 

0.0000 

7 

table  degenerate 

1.000 

0.0000 

0.0000 

o.cooo 

8 

table  degenerate 

1.000 

0.0000 

0.0000 

0.0000 

E  -U.n  A.  =  0.000 

Z  is  indeterminate 

E  E(-2tn 

Ai)  = 

0.000 

E  Var(-2£n 

Ai)  =  0.000 

In  brief  we  see  no  beaker  to  beaker  heterogeneity  within  groups  since  there 
is  essentially  either  no  mortality  or  complete  mortality  within  each  group. 
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Goulden-Isophorone  21  Day  Mortality 

This  test  is  also  a  static  renewal  test.  There  are  a  control  group  and 
five  treatment  groups.  Each  group  consists  of  10  beakers.  Beakers  1-7 
contain  just  individual  daphnids.  These  are  intended  primarily  to  measure 
productivity.  Beakers  8-10  contain  five  daphnids  each.  These  are  intended 
to  estimate  survival.  Several  different  EXAX2  runs  were  carried  out 
In  one  run,  mortality  rates  in  beakers  8,  9  and  10  were  compared  against 
one  another  separately  within  each  group.  This  is  to  determine  whether 
there  is  any  evidence  of  beaker  to  beaker  heterogeneity  among  the  multiple 
daphnid  beakers  within  groups.  In  another  EXAX2  run  mortality  was  pooled 
across  beakers  1-7  and  this  pooled  mortality  was  compared  with  mortality 
pooled  across  beakers  8,  9  and  10  within  each  group.  This  is  to  determine 
whether  there  is  any  evidence  of  differences  in  mortality  rates  between 
singly  housed  and  multiply  housed  daphnids  within  each  dose  group.  The 
survival  results  are  displayed  graphically  in  Figure  II. 8.  The  pooled 
survival  rates  for  the  individually  housed  daphnids  are  plotted  with  the 
plotting  symbol  "7".  The  survival  rates  for  the  beakers  with  multiple 
daphnids  are  plotted  with  the  plotting  symbol  If  two  multiple  daphnid 

survival  rates  coincide,  their  common  value  is  plotted  with  a  "2".  If  one 
or  two  "*"'s  coincide  with  a  "7",  their  common  value  is  plotted  with  an 
"8"  or  "9".  We  see  that  with  the  exception  of  group  6,  there  is  just  one 
death  among  singly  housed  daphnids.  The  multiply  housed  daphnids  appear  to 
have  somewhat  greater  mortality.  The  results  from  the  EXAX2  calculations 
comparing  mortality  rates  among  the  multiply  housed  daphnids  are  summarized 
below. 

GOULDEN-ISOPHORONE  21  DAY  MORTALITY  -  COMPARISONS 
AMONG  BEAKERS  WITH  MULTIPLE  DAPHNIDS 


Trt 

Method 

Chi  Sq 

Ai 

-2£n  Ai 

E(-2£n  At) 

Var(-2£n  A±) 

1 

exact 

1.1539 

1.0000 

0.0000 

0.7159 

1.2812 

2 

exact 

2.1429 

1.0000 

0.0000 

0.0000 

0.0000 

3 

exact 

2.1429 

1.0000 

0.0000 

0.0000 

0.0000 

4 

exact 

2.5000 

0.7253 

0.6424 

0.7821 

1.6103 

5 

exact 

6.9643 

0.0676 

5.3883 

1.2813 

2.9369 

6 

exact 

1.1539 

1.0000 

0.0000 

0.7159 

1.2812 

I  - 
Z  = 

2£n  A^  =  6. 
0.8221 

.0307  E 

E(-2£n  Ai) 

=  3.4952 

E  Var(-2£n  Ai) 

=  7.1095 

The  probability  that  a  standard  normal  random  variable  exceeds  0.82  is  about 
0.20.  We  see  that  there  is  thus  no  statistical  evidence  of  heterogeneity 
among  beakers  within  groups.  Most  of  the  heterogeneity  that  is  observed  is 
that  from  group  5,  where  there  is  marginal  suggestion  of  beaker  to  beaker 
heterogeneity. 


We  now  pool  mortality  results  across  beakers  with  multiple  daphnids 
and  compare  these  with  mortality  results  for  the  individually  housed 
daphnids.  There  are  thus  two  subgroups  per  treatment  group.  The  mortality 
rates  for  the  individually  housed  daphnids  are  based  on  a  sample  of  size  7 
while  the  mortality  rates  for  the  multiply  housed  daphnids  are  based  on  a 
sample  of  size  15.  The  EXAX2  output  is  given  in  Figures  III. 8  to  III. 13 
and  the  results  are  summarized  below. 

GOUL DE N- 1 S OPHORONE  21  DAY  MORTALITY  -  COMPARISONS 
BETWEEN  POOLED  MORTALITY  RATES  FOR  INDIVIDUALLY  HOUSED  DAPHNIDS 
AND  POOLED  MORTALITY  RATES  FOR  MULTIPLY  HOUSED  DAPHNIDS 


Trt 

Method 

Chi  Sq 

Ai 

-2£n  Ai 

E(-2£n  Ai) 

Var(-2£n  Ai) 

1 

exact 

1.0267 

0.5455 

1.2123 

0.9870 

1.7847 

2 

exact 

0.3352 

1.0000 

0.0000 

0.9870 

1.7847 

3 

exact 

0.4889 

1.0000 

0.0000 

0.7287 

1.1379 

4 

exact 

1.6211 

0.5227 

1.2974 

1.1614 

2.2462 

5 

exact 

5.8667 

0.0225 

7.5912 

1.4030 

3.0042 

6 

exact 

0.0037 

1.0000 

0.0000 

1.1614 

2.2462 

I  - 

2£n  Ai  =  10. 

1009  I 

E(-2-£n  Ai)  =  6.4286 

l  Var (-Z£n  Ai) 

=  12.2040 

Z  = 

0.9329 

The  probability  of  a  standard  normal  random  variable  exceeding  0.9329  is 
0.175.  Thus  except  for  group  5,  there  is  no  statistical  evidence  of  differ¬ 
ences  in  mortality  rates  between  individually  and  muj-tiply  housed  daphnids. 
The  statistically  significant  result  in  group  5  is  due  to  the  single  beaker 
in  which  all  five  daphnids  died.  Without  this  beaker  the  mortality  results 
would  be 


Live  Die 


Individual 


Multiple 


An  approximate  two-tailed  probability  of  observing  as  extreme  a  result 
just  due  to  chance  is,  by  the  hypergeometric  distribution, 


3 


We  conclude  that  there  is  no  statistical  evidence  of  overall  differences 
in  mortality  rates  between  individually  and  multiply  housed  daphnids.  The 
interpretation  of  the  significant  difference  in  group  5  depends  on  the 
reason  for  complete  mortality  in  one  of  the  multiple  beakers.  It  should 
be  noted  however  that  in  four  of  the  six  groups  the  individually  housed 
daphnids  had  lower  mortality  rates  than  the  multiply  housed  daphnids  and 
the  rates  were  essentially  the  same  in  a  fifth  group.  Thus  a  significant 
difference  might  have  shown  up  had  the  sample  sizes  been  greater. 

The  previous  chi  square  based  test  is  a  two  sided  test.  We  can  attain 
greater  power  for  testing  for  effects  if  we  carry  out  a  one  sided  test  that, 
for  example,  mortality  is  lower  among  individually  housed  daphnids  than 
among  multiply  housed  daphnids.  One  way  to  carry  out  a  test  of  equality 
of  mortality  rates  against  a  one  sided  alternative,  based  on  asymptotic 
theory,  is  to  perform  an  arc  sin  transformation  and  carry  out  a  normal 
theory  based  test.  Although  the  assumption  of  asymptotic  normality  is 
stretched  a  bit  with  the  small  sample  sizes  in  this  example,  we  carry  out 
the  test  for  illustrative  purposes. 

Within  each  group,  the  transform  2  arc  sin  pl/2  has  an  asymptotic 
normal  distribution  with  mean  2  arc  sin  pl/2  and  standard  deviation  1/n 1/^. 
If  singly  and  multiply  housed  daphnids  have  the  same  mortality  rates  then 
differences  among  their  arc  sin  transforms  will  have  mean  0. 


Group 

Singly  Housed 

Multiply  Housed 

P 

2  arc  sin  pl/2 

Var 

P 

2  arc  sin  pl/2 

Var 

1 

0 

0 

0.143 

0.133 

0.748 

0.067 

2 

0.143 

0.775 

0.143 

0.067 

0.522 

0.067 

3 

0 

0 

0.143 

0.067 

0.522 

0.067 

4 

0 

0 

0.143 

0.20 

0.927 

0.067 

5 

0 

0 

0.143 

0.533 

1.638 

0.067 

6 

0.857 

2.366 

0.143 

0.867 

2.394 

0.067 

Taking  differences  within  each  group  yields 


Group 


D1FF  (Mult  -  Single) 


Std  Err  (DIFF) 


Under  the  null  hypothesis,  DIFF  is  asymptotically  normal  with  mean  0  and 
std  err  0.458.  The  average  difference  is  0.602  with  a  standard  error  of 
0.458/6^/2  =  0.187.  Thus  the  one  sided  test  demonstrates  a  statistically 
significantly  greater  mortality  rate  among  the  multiply  housed  daphnids 
then  among  the  singly  housed  daphnids.  If  we  exclude  group  5  because  of 
the  beaker  with  no  survivors,  the  average  difference  in  the  remaining  five 
groups  is  0.394  with  a  standard  error  of  0.458/5^/2  =  0.205.  The  prob¬ 
ability  of  a  standard  normal  deviate  exceeding  0.394/0.205  =  1.922  is  0.027. 
Thus  even  without  group  5,  this  one  sided  test  suggests  some  statistical 
evidence  for  greater  mortality  among  the  multiply  housed  daphnids.  However 
the  validity  of  the  asymptotic  theory  in  this  example  is  questionable.  The 
validity  of  this  one  sided  asymptotic  test  should  be  studied  in  greater 
detail. 

In  summary,  the  exact  small  sample  heterogeneity  test  based  on  the  chi 
square  distribution  reveals  no  overall  heterogeneity  among  the  mortality 
responses  in  the  beakers  with  multiple  daphnids.  It  also  reveals  no  overall 
heterogeneity  between  average  mortality  per  group  for  the  individually  and 
the  multiply  housed  daphnids.  The  one  sided,  asymptotic  test  of  hetero¬ 
geneity  reveals  some  statistical  evidence  of  differences  in  mortality 
between  the  individually  and  multiply  housed  daphnics.  The  conclusions 
about  differences  in  mortality  between  singly  and  multiply  housed  daphnids 
are  thus  tentative. 


C.  APPLICATION  OF  EXAX2  TO  TEST  FOR  HETEROGENEITY  AMONG  BEAKERS  WITHIN 

GROUPS  AFTER  ADJUSTING  FOR  EARLY  LIFE  STAGE  MORTALITY 

The  mortality  comparisons  among  beakers  within  groups  discussed  in  the 
previous  subsection  were  all  based  on  21  day  mortality.  This  measure  of 
mortality  is  an  overall  measure  and  encompasses  both  early  and  later  stage 
mortality.  In  each  of  the  data  sets  discussed  in  this  study,  mortality  is 
measured  at  a  number  of  points  in  time  during  the  course  of  the  test. 
LeBlanc  measures  mortality  at  7,  14  and  21  days.  Adams  measures  mortality 
at  days  2,  4,  7,  9,  11,  14,  16,  18  and  21.  Chapman  and  Goulden  also 
measure  mortality  at  nine  time  points  during  the  test.  These  intermediate 
mortality  responses  can  be  used  to  separate  inferences  about  overall 
mortality  into  inferences  about  early  life  stage  mortality  and  later  life 
stage  mortality.  One  simple  way  of  doing  this  is  indicated  below.  While 
no  new  statistical  issues  arise,  the  additional  information  obtained  can 
add  biological  insights  and  support  or  refute  various  conjectures  about 
causes  of  mortality  and  causes  of  variation  in  mortality  among  beakers. 

LeBlanc  measured  mortality  on  days  7,  14  and  21.  Thus  the  overall  21 
day  mortality  can  be  decomposed  into  7  day  mortality  (early  life  mortality) 
and  21  day  mortality  conditional  on  survival  for  7  days  (later  life 
mortality).  This  second  measure  of  mortality  eliminates  the  influences 
of  the  early  mortality.  Similarities  or  differences  in  the  patterns  across 
beakers  and  groups  of  these  two  components  of  mortality  would  tend  to 
support  or  refute  various  conjectures  about  the  reasons  for  the  observed 
mortality.  For  example  we  observed  strong  dichotomies  in  21  day  mortality 
rates  among  beakers  in  group  6  of  LeBlanc's  Test  A  and  among  beakers  in 
group  7  of  LeBlanc’s  Test  B.  Did  these  dichotomies  occur  in  early  stage 
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mortality,  in  later  stage  mortality,  or  in  both?  To  study  conditional  21 
day  mortality  given  7  day  survival,  we  simply  define  mortality  rates  as  1 
minus  number  live  after  21  days  divided  by  number  live  after  7  days.  We 
then  proceed  as  before.  The  only  technical  difference  might  be  unequal 
"sample"  sizes  among  beakers  within  groups  or  among  groups.  Of  course, 
we  could  also  condition  on  14  day  mortality. 

We  illustrate  the  use  of  such  conditional  measures  of  mortality  by 
testing  for  beaker  to  beaker  heterogeneity  within  groups  in  LeBlanc's 
Tests  A  and  B  using  EXAX2. 

LeBlanc  Test  A  Day  Mortality  Conditional  on  7  Day  Survival 

The  results  from  the  EXAX2  calculations  are  shown  in  Figures  III. 14  to 
III. 20.  They  are  summarized  below. 

LE  BLANC  TEST  A  21  DAY  MORTALITY  CONDITIONAL  ON  7  DAY  SURVIVAL 


Trt 

Method 

Chi  sq 

Ai 

-2-tn  A± 

E(-2£n  Ai) 

Var(-2£n  Ai) 

1 

exact 

0.0000 

1.0000 

0.0000 

1.66087 

3.63660 

2 

exact 

3.8095 

0.6105 

0.9869 

0.8687 

1.6419 

3 

exact 

7.7714 

0.0757 

5.1619 

1.5750 

3.4893 

4 

exact 

4.2270 

0.3445 

2.1311 

1.5068 

3.1226 

5 

exact 

5.4795 

0.1975 

3.2436 

1.5068 

3.1226 

6 

exact 

12.5778 

0.0069 

9.9396 

1.9265 

3.9809 

7 

exact 

2.0000 

1.0000 

0.0000 

0.0000 

0.0000 

1  II 

CsJ 

2tn  Ai  = 
2.2433 

21.4632  l  E(-2ln  A±)  =  9.0447 

P(Z  >  2.2433)  =  0.0124 

Z  Var(-2£n  Ai)  =  18.9938 

We  can  determine  7  day  survival  from  the  "TOTAL"  column  in  Figures  III. 14 
to  III. 20.  Figures  III. 14  to  III. 18  show  that  there  was  no  observed  early 
mortality  in  groups  1-5.  Figure  III. 19  shows  a  strong  beaker  to  beaker 
dichotomy  in  7  day  mortality,  very  similar  to  that  observed  in  the  21  day 
mortality.  Figure  III. 20  shows  almost  complete  mortality  in  group  7  by  the 
seventh  day.  Thus  the  results  in  groups  1-5  and  7  with  respect  to  conditional 
21  day  mortality  are  in  complete  agreement  with  those  for  unconditional  21  day 
mortality,  namely  no  evidence  of  beaker  to  beaker  heterogeneity,  except  per¬ 
haps  a  suggestion  in  group  3.  The  heterogeneity  chi  square  in  group  6  is 
very  large  and  highly  statistically  significant,  although  no  where  near  as 
large  as  that  for  the  unconditional  21  day  mortality.  This  is  quite  impor¬ 
tant  because  it  tells  us  that  the  observed  high  mortality  rates  in  beakers 
C  and  D  and  the  relatively  low  rates  in  beakers  A  and  B  represent  a  persist¬ 
ent  pattern  throughout  the  entire  duration  of  the  test.  Namely,  of  the 
seven  survivors  beyond  day  7  in  beaker  C  two  died  (i.e.  29  percent  condi¬ 
tional  mortality  rate).  Of  the  four  survivors  in  beaker  D  three  died  (i.e. 

75  percent  conditional  mortality  rate).  Beakers  A  and  B  had  10  percent  and 
6  percent  conditional  mortality  rates  respectively.  Thus  whatever  caused 
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the  dichotomy  in  mortality  rates  occurred  either  early  in  the  test  (there¬ 
by  stressing  the  daphnids  early)  or  else  persistently  throughout  the  test. 
Beaker  D  experienced  by  far  the  worst  mortality  rates,  both  in  the  early 
and  the  later  stages  of  the  test. 

Overall  there  is  evidence  of  significant  beaker  to  beaker  heterogeneity, 
due  primarily  to  group  6.  (P[Z> 2.243]  =  0.012).  We  will  study  the  magni¬ 

tude  of  this  variation  in  greater  detail  in  later  sections. 

LeBlanc  Test  B  21  day  Mortality  Conditional  on  7  Day  Survival 

The  situation  is  similar  to  that  for  Test  A.  There  is  little  7  day 
mortality  in  any  of  the  beakers  in  groups  1-6.  Namely,  the  observed  numbers 
of  deaths  at  7  days,  by  beaker,  for  each  of  these  groups  are  (0,  1,  0,  0), 
(1,  2,  1,  1),  (0,  1,  0,  2),  (0,  0,  0,  0),  (0,  0,  0,  0)  and  (0,  0,  0,  0). 

The  conditional  and  unconditional  21  day  mortality  results  should  thus  be 
similar  in  those  groups.  The  results  from  the  EXAX2  calculations  are 
summarized  below.  Those  for  group  7  are  shown  in  Figure  III. 21. 

LE  BLANC  TEST  B  21  DAY  MORTALITY  GIVEN  7  DAY  SURVIVAL 


Trt 

Method 

Chi  sq 

Ai 

-2-£n  Ai 

E(-2£n  Ai) 

Var(-2£n  Ai) 

1 

exact 

9.0690 

0.0312 

6.9371 

1.8166 

3.7415 

2 

exact 

1.1096 

0.8640 

0.2925 

1.8466 

3.8394 

3 

exact 

3.0928 

0.4805 

1.4658 

1.0412 

1.4213 

4 

exact 

5.7600 

0.1591 

3.6768 

1.3590 

2.6405 

5 

exact 

1.7451 

0.7153 

0.6702 

1.7273 

3.6654 

6 

exact 

4.4444 

0.2521 

2.7562 

1.5361 

3.4424 

7 

exact 

16.0178 

0.0021 

12.3604 

1.8417 

3.8074 

I  - 

2in  Aj[  = 

28.1589  Z 

E(-2  n  Ai) 

=  11.1686 

Z  Var (-2 

n  Ai)  =  22.5580 

Z  = 

2.7647 

P(Z  >  2.7647)  =  0.0028 

Figure  III. 21  shows  a  strong  beaker  to  beaker  dichotomy  in  7  day 
mortality  in  group  7,  with  beakers  A  and  D  exhibiting  substantially  higher 
mortality  than  that  in  beakers  B  and  C.  Beakers  A  and  D  also  exhibit 
substantially  greater  later  stage  mortality  rates,  especially  beaker  D. 

It  is  very  interesting  to  note  again  that  the  beakers  that  exhibit  the 
greater  early  stage  mortality  also  exhibit  the  greatest  later  stage 
mortality.  Thus  the  cause  of  the  dichotomous  mortality  rates  either 
occurred  early  in  the  test  or  else  persisted  throughout  the  test.  These 
results  are  in  direct  correspondence  with  those  from  group  6  of  Test  A, 
thereby  leading  to  a  conjecture  of  a  common  cause  for  the  dichtomous  results 
in  each  group. 

Overall  there  is  evidence  of  statistically  significant  beaker  to  beaker 
heterogeneity,  due  primarily  to  group  7.  We  will  study  the  magnitude  of 
this  variation  in  greater  detail  in  later  sections. 
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Figure  III.l.  EXAX2  output.  LeBlanc  Test  A  Group 


Figure  III. 2.  EXAX2  output.  LeBlanc  Test  A  Group 
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Figure  III. 3.  EXAX2  output.  LeBlanc  Test  A  Group 


Figure  III. 6.  EXAX2  output.  LeBlanc  Test  A  Group 
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Figure  III. 7.  EXAX2  output.  LeBlanc  Test  A  Group  7  and  combination  of  results  across  groups 


OULDEN  SURVIVAL  DATA— REPS  1-7(P03LED)  VS.  REPS  8-1 0 ( PODLEO) 


Figure  III. 8.  EXAX2  output.  Goulden  -  Isophorone.  Group  1.  Comparison  of  mortality 
between  individually  and  multiply  housed  daphnids. 


OULDEN  SURVIVAL,  DATA  — REPS  1  *7  C  POOLED)  VS.  REPS  8-1 0(  POOLED) 


Figure  III. 9.  EXAX2  output.  Goulden  -  Isophorone.  Group  2.  Comparison  of  mortality 
between  individually  and  multiply  housed  daphnids. 


OULDEN  SURVIVAL  DATA— REPS  1-7CPOOLED)  VS.  REPS  8-1 0 ( POOLED) 


Figure  III. 10.  EXAX2  output.  Goulden  -  Isophorone.  Group  3.  Comparison  of  mortality 
between  individually  and  multiply  housed  daphnids. 


IOULDEN  SURVIVAL  DATA  — REPS  1-7(P03LED)  VS.  REPS  8-1 0( POOLED) 
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Figure  III. 11.  EXAX2  output.  Goulden  -  Isophorone.  Group  4.  Comparison  of  mortality 
between  individually  and  multiply  housed  daphnids. 


OOLOEN  SURVIVAL  DATA— REPS  l-7(PODtED>  VS,  REPS  0-JO(POOIjED) 


Figure  III. 12.  EXAX2  output.  Goulden  -  Isophorone.  Group  5.  Comparison  of  mortality 
between  individually  and  multiply  housed  daphnids. 


OULOEN  SURVIVAL  DATA— REPS  1«7(P03LED)  VS.  REPS  B-IO(POOLED) 
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Figure  III. 14.  EXAX2  output.  LeBlanc  Test  A.  Group  1.  21  day  mortality  conditional 

on  7  day  survival. 
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Figure  III. 15.  EXAX2  output.  LeBlanc  Test  A.  Group  2.  21  day  mortality 

conditional  on  7  day  survival. 


Figure  III. 17.  EXAX2  output.  LeBlanc  Test  A.  Group  4.  21  day  mortality 

conditional  on  7  day  survival 
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e  III. 18.  EXAX2  output.  LeBlanc  test  A.  Group 
21  day  mortality  conditional  on  7  day  survival 
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Figure  III. 19.  EXAX2  output.  LeBlanc  test  A.  Group 
21  day  mortality  conditional  on  7  day  survival 
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Figure  III. 20.  EXAX2  output.  LeBlanc  test  A.  Group 
21  day  mortality  conditional  on  7  day  survival. 
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Figure  III. 21.  EXAX  2  output.  LeBlanc  test  B.  Group  7.  21  day 

mortality  conditional  on  7  day  survival.  Combina¬ 
tion  of  results  across  groups. 


TESTING  FOR  BEAKER  TO  BEAKER  HETEROGENEITY 
WITHIN  TREATMENT  GROUPS  -  LENGTH  DATA 


A.  BACKGROUND 

In  the  previous  section  we  discussed  the  comparison  of  mortality  rates 
across  beakers  within  groups  to  test  for  heterogeneity  of  responses.  Similar 
comparisons  might  be  made  for  the  production  and  length  responses.  There  is 
difficulty  however  with  respect  to  comparisons  of  production  measurements. 
Since  production  is  determined  and  reported  on  a  beaker  basis,  there  is  just 
one  determination  per  beaker  and  no  internal  estimate  of  its  variability. 

In  order  to  test  for  beaker  to  beaker  heterogeneity  we  either  need  internal 
estimates  of  variability  among  daphnids  within  beakers,  such  as  we  have  for 
the  length  data,  or  else  we  need  external  estimates  of  variability  based  on 
a  theoretical  model,  such  as  we  have  for  the  survival  data  (i.e.  binomial 
model).  Thus  unless  we  can  hypothesize  a  theoretical  statistical  model 
that  should  govern  production  as  a  function  of  dose  and  time,  we  have  no 
alternative  but  to  carry  out  further  statistical  analyses  on  a  beaker  basis. 

The  situation  is  different  with  respect  to  length  data.  Lengths  are 
measured  and  reported  on  a  per  daphnid  basis.  We  can  thus  compare  average 
lengths  across  beakers  by  analysis  of  variance  techniques,  using  variability 
among  daphnids  within  beakers  as  an  error  yardstick.  We  carry  out  such 
comparisons  below  for  LeBlanc's  Tests  A  and  B.  Adams  and  Goulden  report  no 
length  data.  In  Chapman's  test  there  is  just  one  daphnid  per  beaker,  so 
there  is  nothing  to  compare. 

B.  ANALYSIS  OF  VARIANCE  TESTS  FOR  HETEROGENEITY  AMONG  BEAKERS 

WITHIN  GROUPS  IN  LE  BLANC’S  TESTS  A  AND  B 

In  this  subsection  we  illustrate  the  comparisons  of  average  21  day 
lengths  among  replicate  beakers  within  groups. 

LeBlanc  Test  A  21  Day  Lengths 


There  are  a  water  control  group,  a  solvent  control  group,  and  five 
treatment  groups.  Group  7,  the  highest  treatment  group,  had  just  one 
survivor.  There  are  thus  no  comparisons  to  be  made.  Comparisons  among 
beakers  within  groups  can  be  made  by  fitting  a  two  way  nested  analysis  of 
variance  model  to  the  data.  Alternatively  one  way  analysis  of  variance 
models  can  be  fitted  separately  to  the  responses  within  groups  and  the 
results  pooled  across  groups.  This  was  done  for  the  data  from  groups  1-6. 
The  results  of  these  calculations  are  shown  in  Figures  IV. 1  to  IV. 7  and 
are  summarized  below.  The  average  lengths  within  beakers  are  displayed 
graphically  in  Figure  11.26  and  the  standard  deviations  are  displayed  in 
Figure  11.27.  The  standard  deviation  plot  is  not  suggestive  of  any  parti¬ 
cular  standard  transformation  to  be  carried  out  on  the  lengths  before 
analysis.  We  carry  out  comparisons  on  the  untransformed  lengths. 


LE  BLANC  TEST  A  21  DAY  LENGTHS 


Trt 

Between 
Beakers  SS 

d.f. 

Within 
Beakers  SS 

d.f. 

Within  Group 
Significance  Level 

1 

1.2335 

3 

8.1753 

64 

0.029 

2 

0.1941 

3 

7.0090 

73 

0.571 

3 

1.9310 

3 

8.3227 

66 

0.003 

4 

0.5636 

3 

4.8759 

69 

0.055 

5 

0.8066 

3 

8.3178 

69 

0.092 

6 

0.1708 

3 

5.4292 

36 

0.770 

4.8996 

18 

42.1299 

377 

ANOVA  TABLE  (NESTED) 


Source 

d.f. 

Sum  of  Squares 

Mean  Square 

Between  Groups 

5 

5.5815 

1.1163 

Between  Beakers  Within  Groups 

18 

4.8996 

0.2722 

Within  Beakers 

377 

42.1229 

0.1118 

Total 

400 

52.6110 

We  test  for  significant  beaker  to  beaker  variation  within  groups  by  comparing 
the  between  beakers  within  groups  mean  square  to  the  within  beaker  mean 
square.  This  ratio  has  an  F  distribution  with  18  and  377  d.f.  under  the 
null  hypothesis  of  no  beaker  to  beaker  variation. 

0  2722 

F  =  =  2.435.  Significant  at  a  =  0.001 

U . II lo 

There  is  thus  strong  statistical  evidence  of  beaker  to  beaker  variation  in 
lengths  within  groups.  This  is  not  a  property  of  just  one  group  or  just 
one  beaker  since  four  of  the  six  groups  show  significant  F  ratios  based 
on  just  the  data  within  these  groups. 

LeBlanc  Test  B  21  Day  Lengths 

The  layout  of  the  test  is  similar  to  that  for  Test  A.  Since  treatment 
group  7  has  more  survivors  in  Test  B  than  in  Test  A,  we  include  it  in  the 
comparisons  here.  As  before,  one  way  analysis  of  variance  models  were  fitted 
separately  to  the  responses  within  groups  and  the  results  pooled  across 
groups  to  yield  a  nested  two  way  analysis  of  variance.  The  results  of  these 
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calculations  are  shown  in  Figures  IV. 8  to  IV. 15  and  are  summarized  below. 

The  average  lengths  within  beakers  are  displayed  graphically  in  Figure  11.29 
and  the  standard  deviations  are  displayed  in  Figure  11.30.  The  standard 
deviations  do  not  seem  to  vary  with  length  and  we  again  perform  no  trans¬ 
formations  on  the  length. 

LE  BLANC  TEST  B  21  DAY  LENGTHS 


Trt 

Between 
Beakers  SS 

d.f. 

Within 
Beakers  SS 

d.f. 

Within  Groups 
Significant  Level 

1 

0.1633 

3 

6.5761 

68 

0.641 

2 

1.6418 

3 

7.5000 

63 

0.006 

3 

1.6130 

3 

14.8211 

72 

0.058 

4 

0.3889 

3 

9.7439 

71 

0.424 

5 

0.1878 

3 

6.4462 

63 

0.610 

6 

1.4236 

3 

11.0764 

68 

0.041 

7 

0.266 

3 

3.6130 

26 

0.598 

5.6844 

21 

59.7767 

431 

ANOVA  TABLE  (NESTED) 


Source 

d.f. 

Sum  of  Squares 

Mean  Square 

Between  Groups 

6 

1.7490 

0.2915 

Between  Beakers  Within  Groups 

21 

5.6844 

0.2707 

Within  Beakers 

431 

59.7767 

0.1387 

Total 

458 

67.2098 

We  test  for  significant  beaker  to  beaker  variation  within  groups  by  comparing 
the  between  beakers  within  groups  mean  square  to  the  within  beakers  mean  square 
This  ratio  has  an  F  distribution  with  21  and  431  d.f.  under  the  null  hypo¬ 
thesis  of  no  beaker  to  beaker  variation. 

F  =  =  1-952.  Significant  at  a  =  0.006 

0. 1387 

There  is  thus  strong  statistical  evidence  of  beaker  to  beaker  variation  in 
lengths  within  groups.  This  result  is  in  direct  agreement  with  that  for 
Test  A. 


In  summary,  we  have  evidence  of  beaker  to  beaker  variation  in  lengths 
in  both  Tests  A  and  B.  Subsequent  analyses  will  have  to  account  for  this. 
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Figure  IV.  2.  AVOVA  output.  LeBlanc  Lest  A.  (Iroup  1  (Water 
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Figure  IV. 5.  ANOVA  output.  LeBlanc  test  A.  Group 
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Figure  IV. 7.  ANOVA  output.  LeBlanc  test  A.  Group 
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Figure  IV. 8.  LeBlanc  test  B.  ANOVA  of  lengtiis  pooled 
over  beakers  within  groups. 


Figure  IV. 9.  ANOVA  output.  LeBlanc  test  B.  Croup  1  (Water  Control). 
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Figure  IV. 12.  ANOVA  output.  LeBlanc  test  B.  Group 
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Figure  IV. 13.  ANOVA  output.  LeBlanc  test  B.  Group 
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V.  ADJUSTMENTS  TO  ACCOUNT  FOR  BEAKER  TO  BEAKER 
HETEROGENEITY  WITHIN  TREATMENT  GROUPS 


A.  BACKGROUND 

In  the  previous  two  sections  we  have  discussed  procedures  to  test  for 
beaker  to  beaker  heterogeneity  within  treatment  groups  for  survival  and  for 
length  responses.  In  both  sections  we  found  statistical  evidence  of  beaker 
to  beaker  heterogeneity.  In  this  section  we  discuss  either  adjustments  in 
the  data  or  adjustments  in  the  statistical  procedures  used,  in  order  to 
account  for  this  heterogeneity.  As  remarked  in  the  previous  section,  we 
need  not  adjust  the  productivity  data  since  this  is  measured  and  reported 
on  a  per  beaker  basis  and  there  is  no  obvious  theoretical  model  upon  which 
to  base  variability  estimates.  We  thus  carry  out  statistical  analyses  on 
a  per  beaker  basis.  In  the  subsections  below  we  discuss  procedure  to  adjust 
the  mortality  responses  and  the  length  responses  prior  to  comparison  across 
groups . 

B.  ADJUSTMENTS  TO  ACCOUNT  FOR  HETEROGENEITY  OF  MORTALITY  RESPONSES 

We  tested  for  beaker  to  beaker  heterogeneity  in  mortality  in  Section  III. 
We  found  some  evidence  of  heterogeneity  in  the  data  from  LeBlanc's  Tests  A 
and  B.  There  was  no  heterogeneity  of  responses  in  Adams'  data  (also 
virtually  no  partial  kills).  Chapman's  data  consists  of  just  a  single 
daphnid  per  beaker,  so  beaker  to  beaker  variation  cannot  be  determined. 
Goulden's  isophorone  data  show  no  evidence  of  heterogeneity  of  responses 
within  groups,  among  the  beakers  with  multiple  daphnids.  We  thus  need 
to  account  for  beaker  to  beaker  heterogeneity  in  mortality  in  LeBlanc's 
data  but  not  in  Adams',  Chapman's  or  Goulden's  data. 

The  most  commonly  used  approach  for  the  comparison  of  mortality  rates 
across  treatment  groups  is  to  carry  out  an  arc  sin  variance  stabilizing 
transformation  on  the  observed  mortality  rate  within  each  beaker  and  then 
use  the  mean  square  for  variation  among  beakers  within  groups  (pooled  over 
all  groups)  as  an  error  yardstick.  If  there  are  I  groups,  J  beakers  per 
group,  then  this  error  yardstick  has  I(J-l)  degrees  of  freedom  associated 
with  it. 

This  is  a  conservative  approach.  Although  the  beakers  within  groups 
mean  square  is  correct  whether  or  not  beaker  to  beaker  variation  in  mortality 
exists,  it  is  based  on  relatively  few  degrees  of  freedom  and  thus  can  lead 
to  diminished  sensitivity  of  inferences  (i.e.  lowered  power  of  tests, 
increased  lengths  of  confidence  intervals)  if  I(J-l)  is  small.  For 
example  if  I  =  6,  J  =  3  then  I(J-l)  =  12.  The  sensitivities  of  the 
analyses  would  be  improved  if  the  degrees  of  freedom  for  the  error 
yardstick  could  somehow  be  increased. 

Feder  and  Collins  [1],  Section  IX  discuss  an  approach  to  accounting 
for  the  increased  variability  introduced  by  tank  to  tank  heterogeneity 
in  fathead  minnow  mortality  rates  by  reducing  actual  sample  sizes  per 
group  to  effective  sample  sizes  and  than  disregarding  the  tank  effects. 
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A  very  similar  situation  exists  for  the  Daphnia  mortality  responses  and  so 
we  adopt  a  similar  approach  here.  Following  Feder  and  Collins,  suppose 
that  there  are  20  daphnids  per  beaker  and  a  "true"  mortality  rate  of  p. 

Under  binomial  theory  the  variance  of  p,  the  estimated  mortality  rate, 
should  be  p(l-p)/20.  Suppose  however  that  the  responses  within  each 
beaker  are  positively  correlated  due  to  beaker  to  beaker  heterogeneity  and 
this  increases  the  variance  of  p  by  20  percent  to  1.2p(l-p)20.  Then  we 
can  regard  the  effective  sample  size  within  that  beaker  as  20/1.2=16.67. 

To  maintain  the  observed  response  rate  at  its  level  p  we  adjust  both  the 
number  dead  and  the  number  live  by  the  same  factor.  For  example  if  the  data 
as  reported  show  5  deaths  in  20  daphnids,  we  would  ajdust  this  down  to 
5/1.2  =  4.17  deaths  in  20/1.2  =  16.67  daphnids.  Beaker  to  beaker  hetero¬ 
geneity  is  then  ignored,  effective  numbers  of  responses  and  daphnids  are 
pooled  across  beakers  within  groups,  and  standard  binomial  based  procedures 
are  applied  to  the  adjusted  data  as  if  no  beaker  to  beaker  variation  within 
groups  existed. 

We  now  consider  the  calculation  of  adjustment  factors.  Adjustment 
factors  can  be  calculated  separately  for  each  group  or  a  single  adjustment 
factor  can  be  calculated  for  all  the  groups  combined.  We  first  consider 
the  calculation  of  a  single  adjustment  factor  and  then  consider  displays 
that  suggest  whether  single  or  separate  adjustments  are  called  for. 

Motivation  for  the  adjustment  procedure  comes  from  the  form  of  the  beta 
binomial  model  [Williams  [3]].  Suppose  X..  is  the  number  of  responses  in 
beaker  j  of  group  i  (e.g.  number  dead  afte-?  21  days).  The  beta  binomial 
model  extends  the  binomial  to  allow  for  beaker  to  beaker  variation  within 
groups.  Following  Feder  and  Collins  [1],  Section  IX  we  assume  that 
Xjj  'v  Binomial  (Nij  ,  Pj_j)  where  N^j  is  the  sample  size  within  beaker  j 
of  group  i  and  p-y  is  the  response  probability  there.  Further,  it  is 
assumed  that  pjj  'v  Beta  (a^,  6i)  where  a£  and  gj  are  unknown  parameters. 

Let 

a.  . 

l  „  _  1 

Mi  a  .  +  g .  l  a.+B. 

i  i  ii 

Then  it  is  known  that  X^j  has  a  beta  binomial  distribution  with  parameters 
(Njj,  0^).  See  Williams  for  details.  In  particular  it  can  be  shown 

directly  that 


We  see  that  the  variance  of  X^j  is  inflated  over  and  above  binomial  variance 
by  a  multiplicative  factor. 


Suppose  that  Njj  -  Nj,  j=l,...,  Ji-  This  assumption  is  otten  reasonable 
in  Daphnia  tests,  where  represents  the  number  of  daphnids  in  the  beaker 

at  the  outset  of  the  test.  In  fact  the  assumption  N-^j  e  N  is  often  reason¬ 
able  also.  (Note  that  in  tests  containing  beakers  with  single  daphnids 
and  beakers  with  multiple  daphnids,  we  are  referring  just  to  the  beakers 
with  multiple  daphnids.  e.g.  In  Goulden's  test,  this  discussion  would 
refer  just  the  three  beakers  per  group  containing  five  daphnids  each.) 

Then  the  multiplicative  factor  is  (1  +  0 ^ ) / ( 1  +  0^)  -  K^,  j  =  l,...,  J^. 

Thus  Var(Xij)  =  N|  Pi(l-Pi)  where  1  <  <  «>.  Define  p,—  =  X^/N^. 

p^j  is  the  observed  response  proportion.  Therefore 

K. 

Var(Pii)  =  fT  Ui(1_Pi)  j  =  Ji 

J  i 

Thus  the  effective  sample  size  per  beaker  in  the  i-th  group  is  N^/K^.  As 
the  extent  of  beaker  to  beaker  variation  approaches  0  (i.e.  as  0^  -+  0), 
approaches  1  and  so  N-^/K^  approaches  N^.  As  the  extent  of  beaker  to  beaker 
variation  gets  greater  and  greater  (i.e.  as  0p  -►  «)  ,  approaches  Nj^  and 
so  N^/Kj  approaches  1.  These  two  extreme  situations  call  for  carrying  out 
analyses  on  a  per  daphnid  basis  (aft^r  pooling  responses  across  beakers 
within  groups)  or  carrying  out  analyses  on  a  per  beaker  basis.  In  general 
some  middle  ground  is  appropriate.  Note  that  if  Np  =  N  and  0-j_  =  0  for  all  i, 
then  Ki  =  K  for  all  i  and  K  should  be  estimated  based  on  results  from  all 
the  groups. 

The  procedure  discussed  below  for  calculating  adjustment  factors  is 
motivated  by  the  beta  binomial  theory  results.  A  full  fledged  maximum 
likelihood  procedure  to  fit  the  beta  binomial  model  to  the  data  might 
be  developed,  but  we  decided  to  use  a  simpler  procedure  based  on  the 
method  of  moments. 

Let  Xjj  ,  N-[  denote  the  number  of  responses  and  the  total  number  of 
daphnids  respectively  within  beaker  j  of  group  i.  Let  p^j  e  ^ij/^i'  The 
variance  inflation  factor,  K^,  is  defined  as 

Var(pi  .) 

Ki  =  Tp.u-p.j/N.i 


This  suggests  that  Kj  be  estimated  by  substituting  the  sample  analogues 
of  Var(p-[)  and  p^  in  the  expression  above.  If  the  N^'s  are  not  in  fact 
exactly  the  same  within  each  beaker  then  use  the  average  sample  size  in 
the  expression  for  K^.  (Use  the  harmonic  average  if  the  N^j's  differ  by 
much. ) 


Suppose  there  are  I  treatment  and  control  groups, 
Within  the  i-th  group  let  Nj  "  Jj“l  Xj  Njj,  p ^  J|-l 
(Jj-1)-'  Xj(p|j  -  P|)^  denote  the  average  sample  size. 


beakers  per  group. 
Xj  pj-,  Var(pij)  r 
the  average  observed 


response  rate,  and  the  sample  variance  of  response  rates  respectively. 

The  N  ’ s  are  generally  nearly  equal,  if  not  exactly  equal,  in  daphnid 
toxicity  test  data.  We  estimate  Kj  as 

K±  =  Var(p_)/[p1(l-p1)/Ni] 

We  either  pool  the  K^'s  across  groups  to  obtain  an  overall  adjustment  factor, 
K,  or  else  we  use  separate  adjustment  factors  within  each  group.  We  will 
describe  later  in  this  subsection  a  graphical  procedure  for  comparing  the 
Ki's  across  groups.  For  now  we  consider  an  overall  adjustment  factor  based 
on  all  the  groups. 

Let  denote  the  inflation  factor  within  the  i-th  group.  If  we  take 
p^  as  essentially  (this  is  reasonable,  unless  is  very  close  to  0  or 
to  1,  since  p^  is  an  average  value  over  all  the  daphnids  in  the  group)  then 
can  be  regarded  as  distributed  approximately  as  X?j / (Jf-1) .  If 
=  K,  i=l,...,  I  then  we  can  obtain  a  pooled  estimate''  1  of  K  as 

K  =  Zi(J1-l)  K1/Z1(J1-1) 


If  Jj  -  J  for  all  i  then  the  expression^ for  K  reduces  to  the  simple  average, 
Z^  K^/I.  Under  the  above  assumptions,  it  is  approximately  distributed  as 
K  x| . (jj-i) /^i(Ji“l) •  This  distributional  result  can  be  used  to  provide  an 

upper  confidence  bound  on  K.  Namely,  a  95  percent  upper  confidence  bound 
on  K  is 


K  <  K  zi(Ji-1)/Xj 

i 


(0.05) 

GK-1) 


.  y2  (0.05) 
where 

Ii(Ji-l)  d.f. 


is  the  5th  percentile  of  the  chi  square  distribution  with 
Denote  this  upper  confidence  bound  by  Ku. 


The  suggested  adjustment  procedure  is  to  reduce  the  ettective  sample 
size  within  the^ij-th  beaker  to  N^/T^  in  such  a  way  that  p-j-j  is  unchanged. 
(The  ratio  N-jj /K  is  constrained  to  lie  between  1  and  N^4.)  Then 
ignore  beaker  to  beaker  variation,  pool  results  across  beakers  within 
groups,  and  carry  out  subsequent  analyses  based  on  pooled  results  within 
groups. 


When  comparing  treatment  effects  based  on  the  adjusted  data,  how  many 
degrees  of  freedom  should  be  associated  with  the  error  yardstick  corre¬ 
sponding  to  variation  among  daphnids  within  groups?  If  we  use  binomial 
theory,  then  we  are  assuming  an  infinite  number  of  degrees  of  freedom 
for  this  yardstick.  This  would  be  appropriate  if  and  only  if  K  were 
known,  which  is  not  usually  the  case.  At  the  other  extreme,  we  could 
argue  that  K  is  based  on  Z^(J^-l)  degrees  of  freedom  and  so  this  should 
be  associated  with  the  error  yardstick.  If  Jj  =  J  for  all^i,  this  becomes 
I(J-l)  degrees  of  freedom.  Adjusting  the  sample  sizes  by  K  and  then  using 
Z^(J^-l)  or  I(J-l)  degrees  of  freedom  as  the  case  may  be,  is  very  close 
to  Finney's  [4]  suggestion  of  using  a  heterogeneity  factor  based 


116 


on  the  residual  mean  square  from  the  (probit)  model  fit.  This  is  a  con¬ 
servative  viewpoint  and  uses  no  information  about  the  relation  between 
the  observed  beaker  to  beaker  variation  and  that  predicted  by  binomial 
theory.  Diminished  sensitivity  can  result  when  I(J-l)  is  small.  This 
can  happen  particularly  if  J=2  or  3. 

A  middle  ground  between  the  two  extremes  discussed  above  could  be  based 
on  reasoning  as  follows.  (Assume  for  the  purpose  of  discussion  that  N^j  =  N, 
=  J.  This  simplifies  notation.  However  the  ideas  are  more  general.) 

Since  we  are  assuming  an  effective  sample  size  of  JN/K  per  group  and  pooling 
data  across  beakers,  each  group  provides  JN/K-1  degrees  of  freedom  for 
estimating  variability.  Now  Ky  is  an  upper  confidence  bound  on  K.  Since 
K  is  unknown,  we  substitute  this  upper  bound  for  it  and  thus  assume 
JN/Ky-1  degrees  of  freedom  per  group,  or  I(JN/Ktl-1)  degrees  of  freedom 
altogether.  If  Ku  =  1  then  we  have  I(JN-l)  degrees  of  freedom  and  we  have 
made  no  adjustment.  If  Ku  =  N  then  there  is  effectively  one  observation 
per  beaker  and  so  we  have  I(J-l)  degrees  of  freedom,  just  as  we  associated 
with  oer  beaker  analyses  or  with  using  a  heterogeneity  factor.  The  ratio 
I(JN/KU-1)  is  constrained  to  lie  between  I(J-l)  and  I(JN-l). 


As  an  alternative  to  adjusting  the  sample  sizes  to  effective  sample 
sizes,  N/K,  we  can  carry  out  comparisons  on  a  per  beaker  basis  (after 
performing  an  arc  sin  variance  stablizing  transformation)  and  use  the  mean 
square  for  beakers  within  groups  as  an  error  yardstick.  The  usual  practice 
is  to  associate  I(J-l)  degrees  of  freedom  with  this  yardstick.  A  less 
conservative  practice  would  be  to  compare  the  magnitude  of  this  yardstick 
with  that  expected  based  on  binomial  theory  and  pool  this  information  to 
arrive  at  the  increased  number  of  degrees  of  freedom  I(JN/KU-1).  This  will 
increase  the  sensitivity  of  comparisons  among  treatment  effects,  particularly 
when  I(J-l)  is  rather  small.  Note  that  N/Ky  is  constrained  to  lie  between 
1  and  N. 

We  now  apply  this  procedure  to  the  mortality  data  from  LeBlanc's  Tests  A 
and  B.  From  the  preliminary  tests  of  beaker  to  beaker  heterogeneity  in 
Section  III  we  concluded  that  there  is  strong  statistical  evidence  of 
heterogeneity  in  each  test.  (However  the  heterogeneity  may  be  in  just  a 
single  group  in  each  case.) 

LeBlanc  Test  A 

Group  1:  p^  =  0.15,  p^2  =  0.15,  p^  =  0.15,  p^  =  0.15,  p^  =  0.15, 

Hu  =  N12  =  Nn  =  Nu  =  Nl  =  20,  Var(pi:j)  =  0.00, 

(1-Pi) /Hi  =  0.00638 
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Var(pt.)  0,00 

K1  r-  -  w-  ,  0.00638 

[ pl ( 1— pi> /Ni J 


=  0.00 


Group  2:  p23  =  0.10,  p22  ~  ?23  =  P24  =  0*05,  P2  =  0.0375 

N21  =  N22  =  N23  =  N24  =  ^2  =  20’  Var (p2j)  =  °-0023» 


P2(l-P2)/N2  =  0.00180 


K  -  Var (p2j }  _  0.0023  _ 

2  "  [p2d42)/N2]  "  °-°°18  ’ 


Group  3:  p33  =  0.00,  p32  =  0.05,  p33  =  0.25,  p34  =  0.20,  p3  =  0.125, 
N31  =  N32  =  N33  =  N34  =  N3  =  20,  Var(p3j)  =  0.0142, 
P3(l-p3)/N3  =  0.0055 

K  -  Var(P3i>  =  0.0142 

3  "  [p3(l43)/N3]  "  °-°055  ' 


Group  4:  p41  =  0.15,  p42  =  0.00,  p43  =  0.15,  p44  =  0.05,  p4  =  0.0875 

N41  =  N42  =  N43  =  N44  =  *4  =  20’  Var(p4j)  =  °-0056’ 
P4(l-P4)/N4  =  0.0040 

K  =  Var(P4i} _  =  0^056  =  1  40 

4  iVi-?4>/v  °-0040  ' 


Group  5:  p^^  =  0.10,  p^2  =  0.00,  p^3  =  0.20,  p54  =  0.05,  p,.  =  0.0875 
N51  =  N52  =  N53  =  N54  =  N5  =  20,  Var(p5j)  =  0.0073, 
P5(l-P5)/N5  =  0.0040 


K  =  -  Va-r--5i-) _  ,  0.-00.73  ,  1>825 

5  [pc(l-pc)/Nj  °-0040 


Group  6:  =  0.10,  p^2  =  0.20,  p^3  =  0.75,  =  0.95,  p^  =  0.50, 

S61  *  N62  '  S63  *  N64  *  »6  '  20'  ''”<%>  "  °'1717- 
P6^1_P6^N6  =  0,0125 


Var(p6i) 

[P6(1-p6)/N6] 


0.1717 

0.0125 


=  13.76 


Group  7:  p^  =  1.00,  p^  =  1-00,  p^  =  0.95,  =  1.00,  p7  =  0.9875, 

N?1  -  N?2  =  N?3  =  N?4  =  N?  =  20,  Var(p7j)  =  0.00063, 
p  (l-p?)/N  =  0.00062 


Var(p? .) 
[p7(l-p7)/N7] 


0.00063 

0.00062 


=  1.02 


It  is  obvious  that  the  inflation  factor  from  group  6  dominates  all  the 
others.  However  if  we  ignore  this  for  the  moment  for  the  sake  of  illustra¬ 
tion  and  calculate  an  overall  inflation  factor,  we  obtain 

7  ~ 

Zi=l  Ki  _  0.00  +  1.28  +  2.58  +  1.40  +  1.83  +  13.76  +  1.02  ,  .. 

^  —j  y  «3  •  1  ^ 


Under  the  (rather  dubious)  assumption  that  all  the  Kj's  are  equal,  the 
distribution  of  K  may  be  approximated  as  K  X2 ) /I ( J-l)  =  K  X^/21.  A 
95  percent  upper  confidence  bound  on  K  would  then  be 


K<K  I(J-1)/Xj(j_1) (0.05)  =  (3.12)(21)/x21(0.05)  =  (3 . 12) (21) /II . 6  =  5.65 

The  suggested  adjustment  procedure  is  to  reduce  the  effective  sample 
size  within  each  beaker  to  20/3.12  =  6.41  while  maintaining  the  observed 
mortality  rates,  disregard  beaker  to  beaker  variation  and  pool  results 
across  beakers  within  groups,  and  carry  out  subsequent  analyses  based 
on  the  pooled  results  within  groups. 

When  comparing  treatment  effects  we  associate  I(JN/KU-1)  =  7((4)(20)/ 
5.65-1)  =  92.1  ~  92  degrees  of  freedom  with  the  error  yardstick. 
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LeBlanc  Test  B 


Group  1: 


Group  2: 


Group  3: 


Group  4: 


P13  =  0.05,  p^2  =  0.10,  p^^  =  0.00,  p^  =  0.25,  p^  =  0.10, 


N,,  =  N,  „  =  N,  ~  =  N, ,  = 


11 


12 


13 


14 


=  Nx  =  20,  Var(plj)  =  0.0117, 


P1(1'P1)/N1  =  °-0045 


K1  " 


Var  (p^) 
[p1(l-p1)/N1] 


0.0117 

0.0045 


=  2.60 


P21  =  0.15,  p22  =  0.20,  p23  =  0.20,  p2^  =  0.10,  p2  =  0.1625 

N21  =  N22  =  N23  =  N24  =  =  20*  Var(p2j)  =  0.0023, 

p  (1-?  )/N  =  0.0068 


K2  " 


Var(P2;j) 

[p2(l-P2)/N2] 


0.0023 

0.0068 


=  0.3368 


P31  =  0.00,  p32  =  0.10,  p33  =  0.00,  p3^  =  0.10,  p3  =  0.05, 


31 


'32 


N33  =  N34=N3  =  20’  Var(P3j)  =  0-0033, 


P3(l-P3)/N3  =  0.0024 


Var (p  ) 
K,  =  -7 - 


[p3(1-p3)/N3] 


=  1.375 


p^3  =  0.15,  p^2  =  0.00,  p^3  =  0.00,  p  ^  =  0.10,  p^  =  0.0625 
=  Na2  =  N^3  =  20,  Var(p^j)  =  0.0056, 


Group  5:  p$1  =  0.10,  p,.,  =  0.15,  p,..,  =  0.25,  pQ/_  =  0.15,  p,.  =  0.1625, 


52 


53 


54 


N51  =  N52 


S3  =  N54  =  N5  =  20>  Var(P5j>  =  0.0040, 


K 


P5(l-P5)/N5  =  0.0068 

Var  Cp  5  j )  0.0040 


5  r—  ,,  —  x  ,  0.0068 

LP5(1-P5) /N5J 


0.588 


Group  6:  p&1  =  0.10,  =  0.00,  p&3  =  0.20,  p&4  =  0.10,  p&  =  0.10, 


N61  =  N62  =  N63  =  N64  *  N6  =  20>  Var<P«>  =  0*0067> 


P6(l-P6)/N6  =  0.0045 


K6  = 


Var(P6.i) 

[p6(1'P6)/^6: 


0.0067 

0.0045 


=  1.49 


Group  7:  p?1  =  0.85,  p?2  =  0.30,  p?3  =  0.40,  p?4  =  0.95,  p?  =  0.625, 

Sr 


N71  =  N72  =  N73  =  N74  =  n7  =  20>  Var(p?.)  =  0.1042, 


p7(l-p  )/Ny  =  0.0117 


„  Var (p7 . )  n/„ 

=  — - r—1 -  =  =  8.906 


[Py (1-Py) /N? ] 


0.0117 


It  is  obvious  that  the  inflation  factor  from  group  7  dominates  all  the 
others.  However,  we  ignore  this  for  the  moment  for  the  sake  of  illustration 
and  calculate  an  overall  inflation  factor. 

-  ,  7  K 

-  _  i=l  i  _  2.60  4-  0.34  +  1.384  +  1.93  +  0.59  +  1.49  +  8.91  „  ,, 

N  7  7  =2.46 


Under  the  (dubious)  assumption  that  all  the  Kj's  are  equal,  the  distribu¬ 
tion  of  K  may  be  approximated  as  K  x2 (j-i) /I ( J— 1 )  =  K  X2l/21-  A  95  percent 
upper  confidence  bound  on  K  would  then  be; 

K  <K  I(J-1)/X2(J_1) (0.05)  =  (2.46) (21)/11.6  =  4.45 


The  suggested  adjustment  procedure  is  to  reduce  the  effective  sample 
size  within  each  beaker  to  20/2.46  =  8,13  while  maintaining  the  observed 


mortality  rates,  disregard  beaker  to  beaker  variation  and  pool  results 
across  beakers  within  groups,  and  carry  out  subsequent  analyses  based  on 
the  pooled  results  within  groups. 

When  comparing  treatment  effects  we  associate  lCJN/Ky-1)  =  7 ((A) (20)/ 
4.45-1)  =  118.8  =  119  degrees  of  freedom  with  the  error  yardstick. 

The  model  upon  which  adjustment  calculations  were  based  assumes  that 
the  extent  of  extrabinomial  variation  (i.e.  the  K^'s)  is  constant  across 
groups.  Before  applying  adjustments  based  on  this  assumption  one  should 
determine  whether  it  is  realistic.  There  were  indications,  mentioned  above, 
that  the  significant  beaker  to  beaker  heterogeneity  observed  in  LeBlanc's 
Tests  A  and  B  may  all  be  due  to  the  extreme  heterogeneity  in  group  6  for 
Test  A  or  group  7  for  Test  B  respectively.  If  the  heterogeneity  among 
beakers  differs  from  group  to  group,  then  separate  adjustment  factors  should 
be  used  within  each  group. 

We  discuss  below  a  procedure  for  determining  whether  the  extent  of 
observed  beaker  to  beaker  extrabinomial  variation  is  the  same  for  all  groups 
or  is  greater  for  some  groups  than  for  others.  The  procedure  consists  of 
calculating  separate  inflation  factors,  Kj,  within  each  group  and  comparing 
them  across  groups.  Under  the  assumption  of  a  common  theoretical  inflation 
factor  K  across  groups,  these  estimated  inflation  factors  are  distributed 
approximately  as  K  /(J-l)  (unless  p£  is  very  close  to  0  or  to  1). 

We  should  thus  see  a  straight  line  or  at  least  a  smooth  curve  when  the 
ordered  inflation  factors  are  plotted  on  chi  square  probability  paper.  If 
one  or  two  points  are  far  removed  from  the  others,  this  suggests  differing 
amounts  of  extrabinomial  variation  across  groups.  This  situation  would 
need  to  be  reflected  in  subsequent  analyses. 

The  results  for  LeBlanc's  Tests  A  and  B  are  summarized  below  in 
Tables  V.l  and  V.2.  In  these  tests  J=4  and  1=7.  The  rank  is  the  order 
of  the  inflation  factor,  from  smallest  to  largest.  The  plotting  position 
for  the  inflation  factor  with  rank  i  is  100(i-0. 5) /7 . 

TABLE  V.l.  LE  BLANC  TEST  A— VARIANCE  INFLATION  FACTOR  BY  GROUP 


Group 


Factor  (K^) 


Plotting  Position 


TABLE  V.2.  LE  BLANC  TEST  B— VARIANCE  INFLATION  FACTOR  BY  GROUP 


Group 

Factor  (K^) 

Rank 

Plotting  Position 

1 

2.60 

6 

78.57 

2 

0.34 

1 

7.14 

3 

1.38 

3 

35.71 

4 

1.93 

5 

64.29 

5 

0.59 

2 

21.43 

6 

1.49 

4 

50.00 

7 

8.91 

7 

92.86 

The  inflation  factors  are  plotted  versus  their  plotting  positions  on 
chi  square  probability  paper  with  3  d.f.  The  results  for  Tests  A  and  B 
are  shown  in  Figures  V.l  and  V.2  respectively.  The  reference  lines  in 
those  plots  correspond  to  the  theoretical  c.d.f.  with  distribution  1/3  X^. 
The  two  plots  look  remarkably  similar.  The  factors  corresponding  to 
group  6  in  Test  A  and  group  7  in  Test  B  are  substantially  out  of  line 
with  those  from  the  other  groups.  The  probability  that  the  maximum  of 
7  independent  random  variables,  each  distributed  as  1/3  X?,  exceeds  8.91 
is  5x10“ 5  and  the  probability  that  it  exceeds  13.76  is  4xl0~8.  Thus  these 
extreme  factors  are  certainly  incompatible  with  beaker  to  beaker  homogeneity 
and  appear  to  be  incompatible  with  the  heterogeneity  observed  in  the  re¬ 
maining  groups.  Apart  from  the  extreme  factors,  the  remaining  groups 
appear  to  exhibit  beaker  to  beaker  heterogeneity  in  excess  of  that  to  be 
expected  on  the  basis  of  binomial  theory.  The  factors  all  seem  comparable 
across  groups.  The  average  inflation  factors,  excluding  the  extreme  groups 
are  1.35  for  Test  A  and  1.39  for  Test  B.  The  test  statistic,  Z,  in  EXAX2 
that  tests  the  hypothesis  of  overall  beaker  to  beaker  heterogeneity  is 
1.00  in  Test  A  and  1.165  in  Test  B  after  the  extreme  groups  have  been 
separated.  Under  the  hypothesis  of  no  beaker  to  beaker  heterogeneity,  Z 
has  a  standard  normal  distribution.  Thus  Z  is  significant  at  the  16  percent 
level  in  Test  A  and  at  the  12  percent  level  in  Test  B.  These  results,  in 
agreement  with  those  in  Figures  V.l  and  V.2  are  suggestive  of  some  beaker 
to  beaker  heterogeneity  but  do  not  provide  strong  statistical  evidence. 

Two  reasonable  approaches  would  be  to  adjust  the  data  in  the  extreme  groups 
by  an  inflation  factor  based  only  on  the  responses  from  those  groups  and 
then  either  apply  no  adjustment  to  the  remaining  groups  or  else  apply  the 
adjustment  factors  calculated  immediately  above  to  these  groups.  The 
second  approach  is  slightly  more  conservative  than  the  first  and  we  adopt 
it  here. 

The  suggested  adjustment  procedures  for  Tests  A  and  B  are  as  follows: 
Test  A 

Group  6:  K&  =  13.76,  K&  y  =  K& (J-l) /X ^J_1) (0. 05)  =  (13. 76) (3) /x 3 (0.05) 


Remaining  groups:  K  =  1.35,  Ku  =  K(I-l)  (J-D/xj^^^^  (0.05) 

=  (1. 35) ( 18 ) / 9 . 39  =  2.59 

Thus  we  reduce  the  effective  sample  size  per  beaker  in  group  6  to 
20/13.76  =  1.45  while  maintaining  the  observed  mortality  rates.  We  reduce 
the  effective  sample  size  per  beaker  in  the  other  groups  to  20/1.35  =  14.8 
while  maintaining  the  observed  mortality  rates.  The  degrees  of  freedom 
per  group  are  max  [(JN/KU-1),  ( J— 1) ] .  We  thus  associate  3  degrees  of 
freedom  with  group  6  and  (4) (20) /2. 59-1  =  29.9  ~  30  degrees  of  freedom 
with  each  of  the  other  groups. 

Test  B 

Group  7:  K?  =  8.91,  K?  u  =  K? ( J-l) /x^J_1) (0. 05)  =  (8.91) (3) /0. 352  =  75.94 

Remaining  groups:  K  =  1.39,  =  K(I-l) (J-l) / X^I_1^ ( j_X) (°* 05) 

=  (1.39) (18) /9. 39  =  2.66 

Thus  we  reduce  the  effective  size  per  beaker  in  group  7  to  20/8.91  = 

2.24  while  maintaining  the  observed  mortality  rates.  We  reduce  the  effec¬ 
tive  sample  size  per  beaker  in  the  other  groups  to  20/1.39  =  14.4  while 
maintaining  the  observed  mortality  rates.  We  associate  3  degrees  of 
freedom  with  group  7  and  (4)(20)/2.66  -  1  =  29.1  ~  29  degrees  of  freedom 
with  each  of  the  other  groups. 

The  results  of  the  adjustment  procedures  applied  to  the  data  from  Tests  A 
and  B  are  presented  in  Tables  V.3  and  V.4  respectively.  These  adjusted 
values  are  used  as  basic  input  "data"  for  subsequent  analyses.  We  then 
proceed  as  if  there  is  no  beaker  to  beaker  heterogeneity  within  groups. 

The  extrabinomial  variation  has  been  accounted  for  by  the  adjustment 
procedure. 

C.  ALTERNATIVE  MEASURES  OF  MORTALITY 

In  the  previous  subsection  we  calculated  adjustment  factors  to  account 
for  beaker  to  beaker  heterogeneity  in  mortality  responses  within  groups. 

All  the  examples  considered  there  pertained  to  21  day  mortality.  However 
in  Section  III  we  noted  that  there  are  alternative  measures  of  mortality 
which  are  of  importance  to  study  such  as  7  day  mortality,  21  day  mortality 
conditional  on  7  day  survival,  etc.  Each  such  measure  provides  information 
about  mortality  during  different  life  stages  and  thus  helps  to  distinguish 
among  the  various  causes  of  mortality — biological  and  experimental. 

It  is  not  clear  a  priori  whether  or  not  the  extent  of  beaker  to  beaker 
heterogeneity  within  groups  observed  and  adjusted  for  in  the  21  day 
mortality  responses  is  also  applicable  for  other  mortality  responses.  If 
that  is  the  case  then  the  adjustment  factors  calculated  for  21  day  mortality 
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TABLE  V. 3.  EFFECTIVE  SAMPLE  SIZES  AND  RESPONSES  IN  LE  BLANC 
TEST  A  21  DAY  MORTALITY  DATA  AFTER  ADJUSTMENT 
FOR  BEAKER  TO  BEAKER  HETEROGENEITY 


Beaker  A 

Beaker  B 

Beaker  C 

Beaker  D 

Dead 

2.2 

2.2 

2.2 

2.2 

Live 

12.6 

12.6 

12.6 

12.6 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

Dead 

1.5 

0.0 

0.0 

0.7 

Live 

13.3 

14.8 

14.8 

14.1 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

Dead 

0.0 

0.7 

3.7 

3.0 

Live 

14.8 

14.1 

11.1 

11.8 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

Dead 

2.2 

0.0 

2.2 

0.7 

Live 

12.6 

14.8 

12.6 

14.1 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

Dead 

1.5 

0.0 

3.0 

0.7 

Live 

13.3 

14.8 

11.8 

14.1 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

Dead 

0.14 

0.29 

1.09 

1.38 

Live 

1.31 

1.16 

0.36 

0.07 

Total 

1.45 

1.45 

1.45 

1.45 

3  d.f 

Dead 

14.8 

14.8 

14.1 

14.8 

Live 

0.0 

0.0 

0.7 

0.0 

Total 

14.8 

14.8 

14.8 

14.8 

30  d.f 

TABLE  V.4.  EFFECTIVE  SAMPLE  SIZES  AND  RESPONSES  IN  LE  BLANC 
TEST  B  21  DAY  MORTALITY  DATA  AFTER  ADJUSTMENT 
FOR  BEAKER  TO  BEAKER  HETEROGENEITY 


Group 

Beaker  A 

Beaker  B 

Beaker  C 

Beaker  D 

1 

Dead 

0.  7 

1.4 

0.0 

3.6 

Live 

13.7 

12.9 

14.4 

10.8 

To  ta  1 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

2 

Dead 

2.2 

2.9 

2.9 

1.4 

Live 

12.2 

11.5 

11.5 

12.9 

Total 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

3 

Dead 

0.0 

1.4 

0.0 

1.4 

Live 

14.4 

12.9 

14.4 

12.9 

Total 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

4 

Dead 

2.2 

0.0 

0.0 

1.4 

Live 

12.2 

14.4 

14.4 

12.9 

Total 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

5 

Dead 

1.4 

2.2 

3.6 

2.2 

Live 

12.9 

12.2 

10.8 

12.2 

Total 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

6 

Dead 

1.4 

0.0 

2.9 

1.4 

Live 

12.9 

14.4 

11.5 

12.9 

Total 

14.4 

14.4 

14.4 

14.4 

29  d.f. 

7 

Dead 

1.91 

0.67 

0.90 

2.13 

Live 

0.34 

1.57 

1.  34 

0.11 

Total 

2.24 

2.24 

2.24 

2.24 

3  d.f. 

would  also  apply  to  comparisons  based  on  other  mortality  responses.  If  the 
extent  of  heterogeneity  differs  for  the  various  mortality  responses,  then 
separate  adjustment  factors  need  be  used  for  each  mortality  response 
studied.  To  get  some  indication  of  which  situation  is  the  case,  we  cal¬ 
culate  heterogeneity  adjustment  factors  for  the  responses  21  day  mortality 
conditional  on  14  day  survival  in  LeBlanc's  Tests  A  and  B.  The  procedure 
used  for  calculating  the  adjustment  factors  is  very  similar  to  that  dis¬ 
cussed  and  illustrated  in  the  previous  subsection. 


LeBlanc  Test  A  -  21  day  mortality  conditional  on  14  day  survival 


Group  1:  p^  =  0.00,  p^7  =  0.00,  p^3  =  0.056,  p^  =  0.00,  p^  =  0.014, 


N11  "  N12 


N14  =  1?’  *13 


18,  N  =  17.25, 


Var(p  )  =  0.000784,  p^l-p^/j^  =  0.00080 


Var (Pl j ) 


0.000784 


[p1(l-p1)/N1] 


0.000800 


=  0.98 


Group  2:  p21  =  0.10,  p22  =  0.00,  p23  =  0.00,  p^  =  0.00,  p2  =  0.025, 


N21  N22 


N23  =  2°’  N24 


19,  N„  =  19.75, 


Var(p2.)  =  0.0025,  P2(l-P2)/N2  =  0.00123 


K2  = 


Var(p2j) 


0.0025 


[p2(l-p2)/N2] 


0.00123 


=  2.03 


Group  3:  P31  =  P32  =  P33  =  p34  =  P3  =  0,  N31  =  20,  N32  =  19,  N33  =  15, 

3j 


N34  =  16,  N3  =  17.5,  Var(p3^)  =  0.00,  p3(l-p3)/N3  =  0.00 


K3  = 


Var  (P3j) 


0.00  ,  J 

q  0q  =  indeterminate.  Define  K3  to  be  1.00. 


Group  4:  p 


[p3<1-p3) /n31 

P/.o  =  PAA  =  0.00,  pA7  =  0.111,  PA  =  0.028,  N  =  17, 

4j 


41  K42  K44  K43  . . 

N42  =  20’  N43  =  N44  =  19’  *4  =  18’75’  Var<£  )  =  0.0031, 


P4(l-P4)/N4  =  0.00145 


2.14 


i  _  /ar^j> 

4  [p4(i-p4)/n4] 

Group  5:  p^3  =  0.10,  p^  =  0.00,  p^3  =  0.111,  p,.^  =  0.050,  p^  =  0.0065, 

N51  =  N52  =  N54  =  20’  N53  =  18’  =  19,5> 

Var(p3j)  =  0.0026,  p5<l-p5)/N5  =  0.00312 


Var(p  ) 

^  =  _ _ 

"5  [p5(l-p5)/N5] 


0.0026 

0.00312 


0.833 


Group  6: 


Delete  beaker  D  from  the  calculation  since  it  has  just  one  live 
daphnid  on  day  14.  Thus  p,.  cannot  be  estimated  verv  precisely. 

OH 


p6i  =  0.053,  p62  =  0.00,  p63  =  0.167,  p&  =  0.073,  N&1  =  19, 
N62  =  16,  N63  =  6,  N6  =  13.67,  Var(p6 J  =  0.0073, 
P6(1-p6)/N6  =  0.0050 


K 


6 


Var(p6i) 


0.0073 

0.0050 


1.46 


Group  7:  We  omit  this  group  from  the  calculations  since  the  14  day  sample 


sizes  are  very  small  in  all  4  beakers  (0,0, 1,1)  and  so  the  p 
cannot  be  estimated  very  precisely. 


7j 


An  overall  inflation  factor  based  on  the  results  from  groups  1-6  is 


£  _  Ei=l  Ki  _  0.98  +  2,03  +  1.00  +  2.14  +  0,83  +  1.46  . 

6  6 

A  A 

Note  that  does  not  appear  to  be  far  removed  from  the  other  K^'s  as 
it  was  for  the  unconditional  21  day  mortality.  We  present  a  chi 
square  probability  plot  to  determine  whether  the  extent  of  observed 
beaker  to  beaker  extrabinomial  variation  is  the  same  for  all  groups  or 
is  greater  for  some  groups  than  for  others.  See  the  discussion  in  the 
previous  subsection  for  a  more  detailed  description  of  the  procedure. 


The  results  for  LeBlanc's  Test  A  are  summarized  below  in  Table  V.5 
and  are  plotted  in  Figure  V.3.  The  majority  of  the  R-j/s  lie  above  the 
cumulative  distribution  function  of  a  1/3  X?  distributed  random  variable. 

A  j 

With  the  exception  of  the  very  large  for  group  6  in  Figure  V.l,  the 
plots  for  the  unconditional  and  the  conditional  mortalities  (i.e.  Figures 
V.l  and  V.3)  look  rather  similar.  In  fact,  the  suggested  adjustment  factors 
are  K  =  1.35  for  the  unconditional  responses  (excepting  group  6)  and  K  =  1,41 
for  the  conditional  responses.  In  brief,  there  again  appears  to  be  a  small 
degree  of  extrabinomial  variation  among  beakers  within  groups  but  the 
extent  is  not  too  great. 

TABLE  V.5.  LE  BLANC  TEST  A— 21  DAY  MORTALITY  CONDITIONAL  ON  14  DAY 
SURVIVAL— VARIANCE  INFLATION  FACTOR  BY  GROUP 


Group 

Factor  (K^) 

Rank 

Plotting  Position 

1 

0.98 

2 

25.00 

2 

2.03 

5 

75.00 

3 

1.00 

3 

41.67 

4 

2.14 

6 

91.67 

5 

0.83 

1 

8.33 

6 

1.46 

4 

58.33 

LeBlanc  Test  B  -  21  Day  Mortality  Conditional  on  14  Day  Survival 


Group  1:  p-^  =  0.05,  p^2  =  0.053,  p^  =  p^  =  0,  p^  =  0.026,  N^  =  20, 

N12  =  19,  N13  =  20,  N14  =  15,  Nx  =  18.5, 

Var(p^)  =  0.00089,  p^l-p^/r^  =  0.00137 


*  _  0.00089 

K1  0.00137 


0.65 


Group  2:  p21  =  0.056,  p22  =  p23  =  0.111,  p24  =  0.053,  p2  =  0.083, 

N2i  =  N22  =  N23  =  18,  N24  =  19,  N2  =  18.25, 
Var(p2j)  =  0.00107,  p2(l-p2>/N2  =  0.00417 


0.00107 

0.00417 


0.26 


Group  3:  p 


Group  4 


Group  5 


Group  6 


Group  7 


31  P33  P34  0.00,  P32  0.053,  P3 

N32  =  N33  =  19’  N34  =  18’  ^3  =  19 


p  (l-p3)/N3  =  0.000675 


K, 


0.00070 


0.000675 


hr  =  1.04 


=  0.013,  N31  =  20, 
Var(p3j)  =  0.00070, 


p41  =  0.150,  p42  =  p43  =  0.00,  p44  =  0.053,  p4  =  0.051,  N41  = 
N42  =  N43  =  20,  N44  =  19,  N4  =  19.75,  Var(p4  )  =  0.0050, 
P4(l-P4)/N4  =  0.00245 


*  _  (h_00_50_ 

4  0.00245 


2.04 


P51  =  0.053,  p52  =  0.105,  p53  =  0.211,  p54  =  0.150,  p5  =  0.130, 
N51  =  N52  =  N53  =  19,  N54  =  20,  N5  =  19.25, 

Var(p5j)  =  0.00450,  p5(l-p5)/N5  =  0.00588 


v  -  0-00450 
K5  0.00588 


0.77 


P61 


P64  0.10,  P62  0.00,  P63  0.20,  P6  -  0.10,  N61  N&2 

N63  =  N64  =  ^6  =  20,  Var(p6j)  =  °-0067’  P6U-P6>/N6  =  0.0045 


"  _  0.0067 

6  0.0045 


1.49 


Delete  beaker  D  from  the  calculation  since  it  has  just  one  live 
daphnid  on  day  14.  Thus 

P24  cannot  be  estimated  very  precisely. 

p71  =  0.40,  p72  =  p73  =  0.00,  p7  =  0.133,  N?1  =  5,  N?2  =  14, 

N  =  12,  N  =  10.33,  Var(p  )  =  0.0533,  p  (1-p  )/N  = 


0.0112 


K  =  0-0533 
7  0.0112 


4.76 


The  inflation  factor  from  group  7  is  somewhat  larger  than  the  others. 
We  will  check  graphically  below  whether  or  not  it  appears  to  be  in  line 
with  the  others.  We  present  a  chi  square  probability  plot  to  determine 
whether  the  extent  of  extrabinomial  variation  (if  any)  appears  to  be  the 
same  for  all  groups. 


The  results  for  LeBlanc’s  Test  B  are  summarized  below  in  Table  V.6 
otte 

tion  of  a  1/3  distributed  random  variable.  The  factor  for  group 


and  are  plotted  in  Figure  V.4.  The  K^'s  lie  above  the  cumulative  distribu 
distributed  random  variable. 

appears  to  be  substantially  out  of  line  with  those  of  the  other  groups. 


A  straight  line  fitted  to  the  K^'s  from  groups  1  to  6  has  a  slope  1.36 
times  that  which  would  be  associated  with  a  1/3  distributed  random 
variable.  Note  that  the  appearance  of  Figure  V.4,  based  on  mortality 
conditional  on  14  day  survival,  is  very  similar  to  that  of  Figure  V.2, 
based  on  overall  mortality.  This  suggests  that  the  heterogeneity  in 
group  7  is  not  just  an  early  life  stage  phenomenon  but  persists  to 
later  stages  of  the  test. 


The  appearance  of  Figure  V.4  suggests  that  we  calculate  a  common  inf la 
tion  factor  for  groups  1-6  and  a  separate  factor  for  group  7.  The  common 
factor  for  groups  1-6  can  be  based  on  the  average  of  the  K^'s.  Namely 


K  =  \  Z  7  K.  =  1.04 
b  1=1  l 

The  factor  for  group  7  would  be  Ky  =  4.76.  Note  that  K  is  slightly  smaller 
than  the  slope  estimated  from  Figure  V.4  (i.e.  1.04  versus  1.36).  K  may 
be  biased  downward  a  bit  by  the  deletion  of  the  largest  value  from  the 
average.  The  question  of  which  estimate  is  better  (i.e.  average  value 
or  slope  from  probability  plot)  is  a  matter  for  further  detailed  research 
and  is  not  pursued  further  here.  For  the  purpose  of  definiteness  we 
suggest  using  K.  Thus  we  recommend  no  adjustments  in  effective  sample 
sizes  in  groups  1-6  and  adjustment  by  a  factor  of  4.76  in  group  7. 

In  conclusion,  the  results  in  this  subsection  suggest  that  different 
degrees  of  beaker  to  beaker  heterogeneity  hold  for  different  measures  of 
mortality.  Therefore  separate  adjustment  factors  should  be  calculated 
and  applied  for  each  measure  of  mortality  studied. 


TABLE  V. 6.  LE  BLANC  TEST  B— 21  DAY  MORTALITY  CONDITIONAL  ON  14  D AY 
SURVIVAL-VARIANCE  INFLATION  FACTOR  BY  GROUP 


Group 

Factor  (K^) 

Rank 

Plotting  Position 

1 

0.65 

2 

21.4 

2 

0.26 

1 

7.1 

3 

1.04 

4 

50.0 

4 

2.04 

6 

78.6 

5 

0.77 

3 

35.7 

6 

1.49 

5 

64.3 

7 

4.76 

7 

92.9 

D.  ADJUSTMENTS  TO  ACCOUNT  FOR  BEAKER  TO  BEAKER  HETEROGENEITY 
OF  LENGTH  RESPONSES 

In  Section  IV  we  tested  for  beaker  to  beaker  heterogeneity  within  groups 
for  length  responses  in  the  data  from  LeBlanc's  Tests  A  and  B.  In  both 
tests,  statistically  significant  heterogeneity  was  found.  The  tests  were 
carried  out  by  means  of  a  two  way  nested  analysis  of  variance,  the  components 
of  which  are  indicated  below.  It  is  assumed  in  the  ANOVA  table  that  there 
is  a  balanced  situation  with  I  treatment  and  control  groups,  J  beakers  per 
group,  and  N  daphnids  per  beaker.  This  assumption  of  balance  is  usually 
pretty  nearly  satisfied  in  aquatic  toxicity  test  length  data,  except  perhaps 
in  those  groups  which  experience  high  mortality.  In  particular  in  LeBlanc's 
data,  all  but  the  highest  treatment  groups  analyzed  (i.e.  group  6  in  Test  A 
and  group  7  in  Test  B)  have  nearly  the  same  sample  sizes. 

COMPONENTS  OF  ANOVA  TABLE  TO  TEST  FOR  BEAKER 
TO  BEAKER  HETEROGENEITY  IN  LENGTHS 


Source 

d.  f . 

Expected  Mean  Square 

2  2  JN  2 

Groups 

1-1 

o  +  N  af  +  £ .  a. 

c  b  1-1  l  l 

Beakers  Within  Groups 

I(J-l) 

2  .  . ,  2 
a  +  N  a, 
c  b 

Daphnids  Within  Groups 

IJ(N-l) 

2 

a 

c 

In  agreement  with  the  notation  used  in  Section  IV,  represents  the  fixed 
group  effect,  al  represents  the  variance  of  the  random  beaker  effect  within 
groups,  and  o£  represents  the  variance  of  the  random  daphnid  effect  within 
beakers. 


It  is  clear  from  the  ANOVA  table  that  the  mean  square  for  beakers 
within  groups  provides  a  correct  error  yardstick  for  inferences  about  group 
effects,  whether  or  not  beaker  to  beaker  heterogeneity  exists.  In  fact 
basing  inferences  on  this  error  term  is  equivalent  to  analyzing  the  length 
data  on  a  per  beaker  basis  (i.e.  analyzing  only  average  lengths  within 
beakers).  This  is  currently  the  most  commonly  used  approach  for  the 
analysis  of  such  data.  This  is  a  conservative  approach. 

However,  although  use  of  the  beakers  within  groups  mean  square  is 
correct,  it  is  based  on  relatively  few  degrees  of  freedom  (I(J-l))  and 
thus  can  lead  to  reduced  sensitivity  of  inferences  about  lengths  if 
I(J-l)  is  small.  This  would  be  the  case  particularly  if  J  was  2  or  3. 

It  would  be  beneficial  for  the  sensitivity  of  the  analyses  if  the  number 
of  degrees  of  freedom  available  for  the  error  yardstick  could  be  increased, 
perhaps  by  somehow  combining  information  from  that  in  another  mean  square. 

A  scheme  for  doing  this  is  discussed  below. 


The  usual  approach  to  combining  information  from  several  mean  squares 
in  analysis  of  variance  is  based  on  a  preliminary  test.  Namely  first  test 
the  hypothesis  that  =  0  by  comparing  the  mean  square  for  beakers  within 
groups  to  the  mean  square  for  daphnids  within  beakers.  If  the  hypothesis 
is  rejected,  use  the  beakers  within  groups  mean  square  with  I(J-l)  degrees 
of  freedom  as  an  error  yardstick.  If  the  hypothesis  is  not  rejected,  then 
combine  the  two  sums  of  squares  for  and  use  a  pooled  error  estimate 
based  on  I(J-l)  +  IJ(N-l)  =  I(JN-l)  degrees  of  freedom.  This  procedure 
corresponds  to  either  carrying  out  comparison  of  lengths  across  groups 
on  a  per  beaker  basis  or  else  pooling  data  across  beakers  within  groups 
and  carrying  out  comparisons  on  a  per  daphnid  basis,  ignoring  the  existence 
of  replicate  beakers.  This  approach  is  somewhat  dichotomous.  One  of  two 
somewhat  different  procedures  is  carried  out,  depending  on  whether  the 
preliminary  test  rejects  or  accepts.  It  would  be  desirable  to  use  a 
procedure  which  provides  a  continuum  of  options  between  the  above  two 
extremes  and  that  does  not  rely  on  the  outcome  of  a  preliminary  test.  We 
now  describe  such  a  procedure. 


Each  individual  length  has  variance  oc  +  Oj,.  Since  the  lengths  from 
the  same  beaker  are  correlated  due  to  the  beaker  effects,  the  average  over 
all  NJ  lengths  within  a  group  has  variance  (o';  +  N  o^)/NJ  =  o^/NJ  +  o^/d. 
Suppose  we  wish  to  account  for  the  within  beaker  correlation  by  reducing 
the  effective  sample  size  within  beakers  from  N  to  x  and  then  treating 
the  "adjusted  samples"  as  if  they  were  independent,  with  variance  +  a^. 

We  would  thus  disregard  beakers  and  carry  out  analyses  on  a  per  daphnid 
basis.  To  determine  the  effective  sample  size,  x,  per  beaker  we  equate 
the  variances  of  sample  averages  under  the  true  and  hypothecated  situations. 
Namely 

2  2  2  2 
a  +  o  +  N  of- 

c _ b  _  _c _ __b 

Jx  "  JN 


Thus 

2  2 
N(  c  +  V 

x  =  — - - -  and  so  1  <  x  <  N 

a  +  N  of 
c  b 
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We  treat  the  data  as  if  there  were  Jx  independent  observations  per  group. 
The  degrees  of  freedom  to  estimate  error  would  then  be  I(Jx-l).  Now 


I(Jx-l)  =  I  (JN 


2 

a 

c 


+  N  o 


2 

b 


1) 


It  is  easy  to  see  that 

I(J-l)  <  I(Jx-l)  £  I(JN-l) 

the  extremes  occurring  as  -*  <*>  or  -»•  0.  Thus  using  I(Jx-l)  degrees  of 
freedom  is  a  compromise  between  using  I(J-l)  d.f.  and  I(JN-l)  d.f. 

2  2 

In  order  to  calculate  x  we  need  to  know  and  aQ.  In  general  these 
quantities  are  unknown,  however  they  can  be  estimated  from  the  data.  Let 

MSI  =  mean  square  for  beakers  within  groups. 

MSE  =  mean  square  for  daphnids  within  beakers. 


Then  /\  /\ 

a2  =  MSE,  o2  +  N  a2  =  MSI,  o2  +  a2  =  ^  MSI  +  (1  -  h  MSE 

c  c  b  c  b  N  N 

Thus 

;  -  MSI  +  <^>  MSE  -  1  +  (N-l)  MSE/MSI  . 

Mb  I 

We  constrain  x  to  be  bounded  by  N.  Namely  we  take 
x  =  min  rl  +  (N-l)  MSE/MSI,  N] 

Since  there  is  uncertainty  in  x,  a  very  conservative  assumption  would  be 
to  take  x  to  be  its  minimum  value  of  1.  This  would  correspond  to  using  MSI 
as  an  error  yardstick  with  I(J-l)  d.f.  Thus  there  is  no  pooling  of  infor¬ 
mation  from  MSE.  A  less  conservative  assumption  would  be  to  use  I(Jx-l)  d.f 
We  use  max(MSI,MSE)  as  the  error  yardstick,  but  utilize  the  information  from 
MSE  to  increase  the  degrees  of  freedom  assumed  from  I(J-l)  to  I(Jx-l).  A 
compromise  between  these  two  values  of  x  would  be  to  calculate  a  lower  con¬ 
fidence  bound  for  x  and  use  this  value  in  the  expression  for  degrees  of  free 
dom.  Let  jc  denote  a  95  percent  lower  confidence  bound  on  x.  We  use 
I(Jjt-l)  degrees  of  freedom  in  conjunction  with  max(MSI ,MSE) . 

A  lower  confidence  bound  on  x  corresponds  to  an  upper  confidence  bound 

on  o£/ct2  since 
b  c 

x  =  N(1  +  o2/o2)/(l  +  N  a2/a2)  =  1  +  (N-l)/(l  +  N  ahah 

DC  DC  DC 


HSI/KSE  -  (1  +  K  ojV)  Fi(iM)> 


Now 


MS I /MS E 
v  (0.05) 
I(J-l),  IJ(N-l) 


=  (MSI/MSE)  F 


(0.95) 

IJ(N-l),  I(J-l) 


2  2 

is  an  upper  95  percent  confidence  bound  on  (1  +  No,  /a  ). 

b  c 


Therefore 


x  .  min  {l  +  (N-l)/ [MSI/MSE]  Pij((i4?'9i(j-1)  1  •  »} 

The  suggested  compromise  procedure  is  to  use  max [MSI, MSE]  as  the  error 
yardstick  but  to  increase  the  effective  degrees  of  freedom  to  I(Jx-l). 

In  subsequent  comparisons  among  treatment  means  we  use(max[MSI,Msfe]/JN)-*-^ 
as  the  estimated  standard  error  of  treatment  group  averages  and  we  associate 

I(Jx-l)  d.f.  with  it. 

% 

This  procedure  has  intuitive  appeal  as  means  of  increasing  the  sensitivity 
of  inferences  about  lengths  when  I(J-l)  is  small.  Its  precise  theoretical 
properties  need  to  be  investigated  in  greater  detail,  perhaps  by  a  Monte 
Carlo  study. 

We  now  apply  this  procedure  to  the  length  data  from  LeBlanc's  Tests  A 
and  B.  Although  N  is  not  entirely  constant  across  beakers,  especially  in 
the  highest  treatment  groups,  we  use  the  above  expressions  with  an  average 
value  of  N. 

LeBlanc  Test  A 

From  Section  IV,  MSI  =  0.2722,  MSE  =  0.118,  1=6,  J=4,  N  E  N  =  401/24  = 
16.71.  Thus  IJ(N-l)  =  377,  I(J-l)  =  18,  F  „(0.95)  =  1.93,  MSI/MSE  = 

2.435.  Therefore  ’ 

x  =  1  +  (15.71) / [2. 345) (1.93) ]  =  4.34 

x  =  1  +  (15. 75) /2. 435  =  7.47 

We  thus  see  that  the  point  estimate  of  the  effective  number  of  daphnids 
per  beaker  is  7.5  and  a  95  percent  lower  confidence  bound  on  this  is  4.3. 

We  associate  I(Jx-l)  -  6( (4) (4. 34)— 1)  =  98.2  ~  98  d.f.  with  the  mean 
square  for  beakers  within  groups.  This  compares  with  6(4-1)  =  18  d.f. 
associated  with  this  mean  square  if  we  pool  no  information  from  the  within 
beakers  mean  square. 

LeBlanc  Test  B 


From  Section  IV,  MSI  =  0.2707,  MSE  =  0.1387,  1=7,  J=4,  N  e  N  =  459/28  = 
16.39.  Thus  IJ(N-l)  =  431,  I(J-l)  =  21,  F  (0.95)  =  1.84,  MSI/MSE  = 

1.952.  Therefore 


x  =  1  +  (15. 39/ [ (1. 95) (1.84) ]  -  5.28 


X  * 


1  +  (15. 39) /l. 952  -  8.88 


The  point  estimate  of  the  effective  number  of  daphnids  per  beaker  is  8.9 
and  a  95  percent  lower  confidence  bound  on  this  is  5.3.  We  associate 
I(Jx-l)  =  7 ( (4) ( 5 . 28)-l)  =  140.84  ~  141  d.f.  with  the  mean  square  for  beakers 
within  groups.  This  compares  with  7(4-1)  =  21  d.f.  associated  with  this 
mean  square  if  we  pool  no  information  from  the  within  beakers  mean  square. 

The  model  upon  which  the  adjustments  in  degrees  of  freedom  in  LeBlanc's 
Tests  A  and  B  was  based  (i.e.  upon  which  the  within  and  between  mean  squares 
were  effectively  pooled)  assumes  that  the  components  of  variation  are  the 
same  across  groups.  Namely  it  is  assumed  that  there  is  random  beaker  to 
beaker  variation  with  variance  and  there  is  random  within  beaker  varia¬ 
tion  with  variance  o^.  Before  applying  adjustments  based  on  this  model 
one  should  determine  whether  all  the  groups  appear  to  have  the  same  vari¬ 
ability  or  whether  there  are  one  or  two  groups  with  variability  substantially 
higher  than  the  remainder,  that  inflate  the  overall  estimates  of  variability 
for  the  other  groups.  If  that  were  the  case,  separate  variability  estimates 
would  need  to  be  calculated  for  the  group  or  groups  with  large  variability 
and  subsequent  analyses  would  need  to  take  this  into  account,  perhaps  by 
using  weighted  least  squares. 


We  discuss  below  a  graphical  procedure  for  determining  whether  the 
observed  beaker  to  beaker  variation  is  due  to  all  groups  or  whether  it  is 
due  just  to  one  or  two  and  we  apply  this  procedure  to  the  length  responses 
from  LeBlanc's  Tests  A  and  B.  The  procedure  consists  of  calculating  the 
mean  square  for  beakers  within  groups  separately  within  each  group  and 
normalizing  these  values  by  the  pooled  mean  square  for  daphnids  within 
beakers.  Under  the  assumption  of  common  variance  structure  across  groups, 
these  normalized  ratios  estimate  1  +  No^/o^  and  are  distributed  approxi¬ 
mately  as  (1  +  No^/o^)  Xj_^/(J-1).  We  should  thus  see  a  straight  line  or 
at  least  a  smooth  curve  when  the  ordered  normalized  ratios  are  plotted  on 
chi  square  probability  paper.  If  one  or  two  points  are  far  removed  from 
the  others,  this  suggests  differing  variability  structures  across  groups 
and  subsequent  analyses  would  need  to  reflect  this. 


The  results  for  LeBlanc's  Test  A  are  summarized  below.  In  this  test 
J=4  and  1=6.  The  rank  is  just  the  order  of  the  ratios,  from  smallest  to 
largest.  The  plotting  position  for  the  ratio  with  rank  i  is  100  (i-0.5)/6. 


TABLE  V.7.  LE  BLANC  TEST  A— NORMALIZED  MEAN  SQUARE  RATIOS  BY  GROUP 


Group 


Normalized  Ratio 


Rank 


Plotting  Position 


The  normalized  ratios  are  plotted  versus  their  plotting  positions  on 
chi  square  probability  paper  with  3  d.f.  The  results  are  shown  in  Figure 
V.5.  The  reference  line  in  that  plot  corresponds  to  the  theoretical  c.d.f. 
of  a  random  variable  with  distribution  1/3  X-L  The  points  are  seen  to  fall 
on  a  straight  line  with  slope  substantially  In  excess  of  that  which  would 
be  expected  if  there  were  no  beaker  to  beaker  variation.  We  conclude  that 
this  plot  suggests  the  presence  of  beaker  to  beaker  variation  in  lengths 
and  that  the  variation  structure  is  constant  across  groups. 

The  results  for  LeBlanc's  Test  B  are  summarized  below.  In  this  test 
J=4  and  1=7.  The  terms  rank  and  plotting  position  have  the  same  meaning 
as  for  Test  A. 

TABLE  V . 8.  LE  BLANC  TEST  B— NORMALIZED  MEAN  SQUARE  RATIOS  BY  GROUP 


Group 

Normalized  Ratio 

Rank 

Plotting  Position 

1 

0.392 

1 

7.14 

2 

3.946 

7 

92.86 

3 

3.876 

6 

78.57 

4 

0.935 

4 

50.00 

5 

0.451 

2 

21.43 

6 

3.421 

5 

64.29 

7 

0.639 

3 

35.71 

The  normalized  ratios  are  plotted  versus  their  plotting  positions  on  chi 
square  probability  paper  with  3  d.f.  The  results  are  shown  in  Figure  V.6. 
The  reference  line  in  that  plot  corresponds  to  the  theoretical  c.d.f.  of 
a  random  variable  with  distribution  1/3  X^.  The  ratios  seem  to  fall  into 
two  distinct  subsets.  The  lower  four  values  appear  to  be  compatible  with 
the  absence  of  beaker  to  beaker  variation  in  their  groups.  The  upper 
three  values  are  separated  from  the  others  and  indicate  the  presence  of 
beaker  to  beaker  variation.  This  graph  should  be  discussed  with  the 
investigator  to  determine  if  any  identifiable  experimental  factors  could 
explain  the  apparent  dichotomy  between  the  variability  in  groups  1,  4,  5 
and  7  on  the  one  hand  and  that  in  groups  2,  3  and  6  on  the  other.  If  so, 
separate  variability  estimates  might  be  used  for  these  two  subsets  of 
groups.  In  the  absence  of  such  information  we  will  use  the  overall  adjust¬ 
ment  results  derived  earlier  in  this  subsection. 
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Figure  V.l.  LeBLanc  Test  A.  Chi  square  probability  plot  ot  mortality  data 
variance  inflation  factors. 


Figure  V.2.  heBlanc  Test  B.  Chi  square  probability  plot  of  mortality  data 
variance  inflation  factors. 


Figure  V.3.  I.eBlanc  Test  A.  21  day  mortality  conditional  on  14  day  survival. 
Chi  square  probability  plot  of  variance  inflation  factors. 


Figure  V.4.  LeBlanc  Test  B.  21  day  mortality  conditional  on  14  day  survival. 
Chi  square  probability  plot  of  variance  inflation  factors. 


Figure  V.5.  LeBlanc  Test  A.  Chi  square  probability  plot  of  mean  squares  for  beakers 

within  groups  normalized  by  mean  square  for  daphnids  within  beakers — lengths. 


143 


Figure  V.6.  I.eBlanc  Test  B.  Chi  square  probability  plot  of  mean  squares  for  beakers  within 
groups  normalized  by  mean  square  for  daphnids  within  beakers-- lengths . 


VI.  OUTLIER  DETECTION  PROCEDURES 


A.  BACKGROUND 

Another  preliminary  analysis  of  importance  is  the  detection  of  responses 
which  do  not  appear  to  be  in  conformance  with  the  substantial  majority  of 
responses.  Such  exceptional  responses  are  often  referred  to  as  "outliers". 
Outlier  detection  procedures  are  used  to  decide  how  extreme  a  response  must 
be  in  order  to  rule  out  the  possibility  that  its  value  is  reasonably  likely 
to  be  due  just  to  random  variation.  Feder  and  Collins  [1],  Section  X 
and  Subsection  XVII  B,  discuss  outlier  detection  procedures  for  the  analysis 
of  mortality  and  length  data  from  toxicity  tests  on  fathead  minnows.  The 
discussion  here  is  patterned  after  that. 

Before  discussing  the  details  of  outlier  detection  procedures  for  the 
various  responses  considered  here,  we  need  to  discuss  an  important  conceptual 
issue.  Outlier  detection  procedures  look  for  responses  which  differ  from 
a  priori  comparable  responses  more  than  would  be  expected  based  on  random 
variation.  In  the  previous  section  we  estimated  the  extent  of  beaker  to 
beaker  variation  and  adjusted  for  this  variation  by  utilizing  the  differences 
in  responses  observed  in  a  priori  comparable  beakers  (i.e.  within  the  same 
treatment  groups).  The  presence  of  outliers  will  inflate  the  estimates  of 
beaker  to  beaker  random  variation.  Conversely,  the  presence  of  random 
beaker  to  beaker  variation  might  cause  extreme  but  naturally  occurring 
responses  to  appear  as  outliers.  Thus  the  notions  of  outliers  and  of 
inflated  beaker  to  beaker  random  variation  are  somewhat  confounded  and 
obscure  one  another.  If  an  individual  observation  or  a  beaker  average 
looks  extreme  there  is  no  way  to  tell,  based  on  the  data  alone,  whether 
that  response  represents  natural  variation  or  whether  it  comes  from  a 
separate  population,  due  perhaps  to  some  deviation  in  biological  material 
or  experimental  technique.  This  is  a  matter  for  judgement  on  the  part  of 
the  investigator.  Statistical  methods  can  point  to  the  extreme  or  out  of 
line  responses.  They  cannot  determine  the  reasons  for  this  behavior  or 
whether  the  observations  in  question  should  be  retained,  deleted,  or  dis¬ 
counted  . 

Outlier  detection  procedures  and  beaker  to  beaker  heterogeneity  adjust¬ 
ment  procedures  impact  on  one  another.  Consider  the  mortality  responses 
for  LeBlanc's  Tests  A  and  B  plotted  in  Figures  II. 1  and  II. 3  and  for 
Goulden's  isophorone  test  plotted  in  Figure  II. 8.  The  responses  in  group  6 
of  LeBlanc's  Test  A,  group  7  of  LeBlanc's  Test  B,  and  group  5  of  Goulden's 
isophorone  test  appear  to  be  widely  separated  from  one  another.  Is  the 
separation  due  to  natural  random  variation  or  are  one  or  more  beakers  out 
of  line  from  the  others?  Based  on  the  appearances  of  the  plots,  the  dif¬ 
ferences  among  these  beakers  in  the  LeBlanc  tests  appear  to  be  due  to 
random  variation.  Two  of  the  four  beakers  show  relatively  low  mortality 
while  the  other  two  show  relatively  high  mortality.  None  of  the  responses 
ire  out  of  line  with  those  from  other  groups.  The  situation  in  group  5 
"!  Goulden's  survival  data  is  a  bit  different.  The  beaker  with  100  percent 
<  rr.ilitv  appears  to  be  out  of  line  with  the  other  two  multiple  daphnid 
■  tU.-rs  and  witli  the  seven  individual  daphnid  beakers  in  its  group.  This 
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suggests  that  it  may  be  a  possible  outlier.  However  it  must  be  stressed  that 
these  interpretations  are  subjective  and  may  differ  among  investigators 
or  among  data  analysts. 

Perhaps  as  a  rule  of  thumb,  extreme  deviations  should  be  regarded 
as  evidence  of  random  beaker  to  beaker  variation  unless  there  is  specific 
evidence  to  the  contrary.  This  is  a  matter  for  further  discussion. 

We  adopt  the  following  order  of  analysis.  Based  on  the  appearance  of 
the  preliminary  scatterplo ts ,  any  responses  that  are  out  of  line  and  whose 
deviations  can  be  traced  back  to  faulty  biological  material,  departures  in 
procedure  from  test  protocol,  accidents,  etc  will  be  screened  out.  The 
remaining  responses  will  be  tested  for  beaker  to  beaker  heterogeneity  and 
appropriate  error  yardsticks  will  be  calculated  or  data  adjustments  made. 

Outlier  detection  tests  will  then  be  carried  out  incorporating  the  pre¬ 
viously  calculated  error  yardsticks  and  data  adjustments. 

We  illustrate  the  outlier  detection  procedures  below. 

B.  OUTLIER  DETECTION  PROCEDURES  APPLIED  TO  MORTALITY  DATA 

The  procedure  followed  is  similar  to  that  discussed  in  Feder  and  Collins  [1] 
Section  X,  Consider  the  i-th  group.  Let  (X.^,  Nil),  Xi2,  Ni2),..., 

(xiJi»  N^j^)  denote  the  numbers  of  responses  and  the  number  of  daphnids 
in  the  various  beakers  j=l,...,  J^.  Delete  the  subscripts  i  in  subsequent 
discussion,  for  ease  of  notation.  Note  that  (Xj ,  Nj )  denote  the  effective 
numbers  of  responses  and  daphnids  respectively,  after  adjusting  for  beaker 
to  beaker  heterogeneity,  rather  than  the  original  numbers.  The  arc  sin 
variance  stabilizing  transformation  is  first  carried  out.  In  particular 
let  pj ,  p  denote  the  response  probability  estimates  for  the  j-th  beaker 
and  for  the  group  respectively.  It  can  be  shown  that  2Nj^' ^(1-Nj/N)--^^ 

[arc  sin  (pj^/^)  _  arc  sin  (pi/ 2) ] ^  j=l,...,  J  have  approximate  standard 
normal  distributions  as  the  Nj ' s  approach  00 .  Graphical  and  numerical 
outlier  detection  procedures  are  based  on  these  standardized  values.  For 
formal  inferences  we  account  approximately  for  the  correlations  among  the 
standardized  values  within  groups  (approximately  -1/(J-1)  if  the  Nj ' s  are 
about  equal)  by  treating  the  J  standardized  values  within  each  group  as 
if  they  were  J-l  independent  values.  This  of  course  has  the  most  effect 
when  J=2. 

We  now  apply  these  transformations  to  construct  graphical  outlier 
detection  displays  based  on  normal  probability  plotting  and  associated 
formal  outlier  detection  tests. 


Group  7 


Beaker 

A 

14.8 

14.8 

1.000 

Beaker 

B 

14.8 

14.8 

1.000 

Beaker 

C 

14.1 

14.8 

0.953 

Beaker 

D 

14.8 

14.8 

1.00 

P 

N./N 

.7 

N.p 

J 

Q2 

0.988 

0.25 

14.622 

0.975 

0.988 

0.25 

14.622 

0.975 

0.988 

0.25 

14.622 

-0.966 

0.988 

0.25 

14.622 

0.975 

To  prepare  the  normal  probability  plot  we  order  the  standardized  values 
and  plot  the  i-th  smallest  against  the  plotting  position  100  x  (i-0.5)/28 
on  the  probability  scale.  These  values  are  indicated  below. 


i 

1 

2 

3 

4 

5 

6 

7 

8 

Ordered 

Va  lue 
Plotting 

-3.211 

-2.676 

-2.644 

-1.720 

-1.720 

-1.303 

-1.269 

-0.966 

Position 

1.8 

5.4 

8.9 

12.5 

16.1 

19.6 

23.2 

26.8 

i 

9 

10 

11 

12 

13 

14 

15 

16 

Ordered 

Value 

Plotting 

-0.895 

-0.734 

-0.703 

0.0 

0.0 

0.0 

0.0 

0.198 

Position 

30.4 

33.9 

37.5 

41.1 

44.6 

48.2 

51.8 

55.4 

i 

17 

18 

19 

20 

21 

22 

23 

24 

Ordered 

Value 

Plotting 

0.222 

0.734 

0.877 

0.877 

0.942 

0.975 

0.975 

0.975 

Position 

58.9 

62.5 

66.1 

69.6 

73.2 

76.8 

80.4 

85.9 

i 

25 

26 

27 

28 

Ordered 

Value 

1.154 

1.441 

1.477 

1.570 

Plotting 

Position 

87.5 

91.1 

94.6 

98.2 

X.,  N.  represent  effective  values,  after  adjustment  for  beaker 
ti  beaker  heterogeneity. 


0  =  (1-N. /N) ~ 1 ^ 2  2  [arc  sin  /p7  -  arc  sin  /£] 
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The  normal  probability  plot  of  these  points  is  shown  in  Figure  VI. 1. 

A  theoretical  N(0,1)  distribution  line  is  drawn  in  for  reference.  It  is 
evident  that  the  standardized  values  lie  on  a  smooth  curve  which  is  definitely 
not  normal.  The  lack  of  normality  is  undoubtedly  due  to  the  relatively  small 
(effective)  sample  sizes  per  beaker  coupled  with  the  relatively  extreme 
average  mortality  rates  within  each  group  (i.e.  close  to  0  or  to  1).  Note 
that  the  minimum  expected  frequencies  min  (Njp,  Njq)  within  each  group 
(based  on  effective  sample  sizes)  are  less  than  2  in  all  but  the  first 
group  where  the  minimum  expected  frequency  is  2.2.  Ii.  fact,  three  of  the 
seven  groups  have  minimum  expected  frequencies  less  than  1.  The  validity 
of  the  normal  approximation  to  arc  sin  (p1'^)  is  questionable  for  such 
small  expected  frequencies.  Furthermore  the  five  smallest  standardized 
values  correspond  to  observed  frequencies  of  0  deaths  per  beaker,  where  the 
normal  approximation  is  least  reasonable. 

Despite  the  lack  of  normality  the  smooth  and  continuous  nature  of  the 
curve  in  Figure  VI. 1  suggests  no  evidence  of  outlying  responses.  The  extreme 
discrepancies  among  mortality  rates  observed  in  group  6  have  been  attributed 
to  random  beaker  to  beaker  variation  and  have  been  accounted  for  by  adjust¬ 
ing  the  effective  sample  sizes  downward  very  extremely.  Thus  the  standard¬ 
ized  values  from  these  groups  do  not  appear  to  be  outliers.  Without  such 
prior  adjustment  they  undoubtedly  would  have. 

It  should  be  noted  that  confirmatory  tests  should  be  carried  out  before 
declaring  a  beaker  response  to  be  an  outlier,  based  just  on  the  appearance 
of  the  normal  probability  plot.  This  is  especially  true  when  the  response 
corresponds  to  small  observed  frequencies  (especially  0)  where  the  normal 
approximation  is  most  questionable.  Such  confirmatory  tests  for  mortality 
responses  are  discussed  in  Feder  and  Collins  [1],  Section  X.  In  parti¬ 
cular,  an  exact  confirmatory  test,  based  on  Poisson  theory,  is  discussed 
for  the  case  when  average  group  response  rates  are  less  than  0.1  or  greater 
than  0.9.  This  is  the  case  in  four  of  the  seven  groups  in  the  Test  A 
mortality  data  and  two  of  the  other  three  groups  are  close.  In  other  cases, 
chi  square  tests  of  homogeneity  can  be  carried  out,  with  adjustments  for 
small  expected  frequencies  or  based  on  exact  small  sample  theory  (e.g. 

Fisher's  exact  test). 

In  brief,  we  conclude  that  there  is  no  evidence  of  outlying  mortality 
responses  in  this  data  set. 

We  now  consider  some  of  the  other  data  sets.  The  pattern  of  the 
mortality  responses  in  LeBlanc's  Test  B  is  very  nearly  the  same  as  that  in 
Test  A,  which  we  have  just  discussed  in  detail.  We  thus  omit  discussion 
of  outlier  detection  in  Test  B.  The  results  will  undoubtedly  be  very 
similar  to  these  from  Test  A — namely  no  outliers.  Since  Chapman's  test 
consists  of  just  one  daphnid  per  beake.  we  cannot  compare  mortality  response 
rates  among  beakers  within  groups  in  this  test.  The  mortality  patterns  in 
Adams'  test  with  selenium  are  quite  consistent  among  beakers  within  groups. 

The  first  four  groups  have  essentially  no  mortality  in  any  beaker  while 
the  last  four  groups  have  100  percent  mortality  in  each  beaker.  (See 
Figure  II. 5.)  There  is  thus  no  suggestion  of  any  outlier  respc.u  es  in  this 
test  and  so  we  omit  discussion  of  outlier  detection  procedures. 
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The  situation  with  Goulden's  test  with  isophorone  is  a  bit  different. 

(See  Figure  II. 8.  Note  that  survival  rates  are  plotted  in  this  figure  rather 
than  mortality  rates.)  The  beaker  in  group  5  with  100  percent  mortality 
appears  to  be  out  of  line  with  the  others  in  that  group.  This  extreme 
response  may  be  due  to  random  variation  or  to  some  identifiable  cause  that 
would  make  it  an  outlier.  There  is  no  way  to  distinguish  between  these  two 
possibilities  solely  on  the  basis  of  the  data.  The  test  records  must  be 
thoroughly  reviewed  and  biological  judgement  must  be  brought  to  bear.  How¬ 
ever  we  note  that  there  is  just  one  beaker  apparently  out  of  line  with  the 
others  and  there  is  no  overall  evidence  of  beaker  to  beaker  heterogeneity 
within  groups  among  the  beakers  with  multiple  daphnids  (see  Section  III) . 

We  will  thus  treat  this  response  as  a  potential  outlier  and  carry  out 
graphical  and  analytical  outlier  detection  procedures  to  confirm  or  refute 
this  conjecture.  We  could  have  alternatively  adjusted  the  effective  sample 
sizes  downward  in  all  the  beakers  in  group  5  to  reflect  random  beaker  to 
beaker  variation  in  this  group  only.  The  decision  to  carry  out  the  outlier 
detection  procedure  rather  than  the  heterogeneity  adjustment  procedure  is 
somewhat  subjective. 

Goulden  -  Isophorone 

Since  there  is  a  suggestion  of  greater  mortality  among  the  multiply 
housed  daphnids  (see  Section  III)  we  compare  mortality  rates  only  among 
the  beakers  with  multiple  daphnids. 


1=6,  J=3 


(i.e.  6  groups,  3  beakers  per  group) 
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Group  4 


Beaker  A 

2 

5 

0.40 

0.200 

0.333 

1.000 

1.211 

Beaker  B 

1 

5 

0.20 

0.200 

0.333 

1.000 

0.000 

Beaker  C 

0 

5 

0.00 

0.200 

0.333 

1.000 

-2.540 

Group  5 

Beaker  A 

5 

5 

1.00 

0.533 

0.333 

2.667 

4.121 

Beaker  B 

2 

5 

0.40 

0.533 

0.333 

2.667 

-0.732 

Beaker  C 

1 

5 

0.20 

0.533 

0.333 

2.667 

-1.943 

Group  6 

Beaker  A 

4 

5 

0.80 

0.867 

0.333 

4.333 

-0.495 

Beaker  B 

4 

5 

0.80 

0.867 

0.333 

4.333 

-0.495 

Beaker  C 

5 

5 

1.00 

0.867 

0.333 

4.333 

2.045 

To  prepare 

the 

normal  probability 

plot  we 

order 

the  standardized  values 

and  plot  the  1-th  smallest  against  the  plotting  position  100  x  (i  -  0.5)/18 


on  the  probability 

scale. 

These  values  are 

indicated 

below. 

i 

1 

2 

3 

4 

5 

6 

Ordered  Value 
Plotting  Position 

-2.540 

2.8 

-2.045 

8.3 

-1.943 

13.9 

-1.434 

19.4 

-1.434 

25.0 

-1.434 

30.6 

i 

7 

8 

9 

10 

11 

12 

Ordered  Value 
Plotting  Position 

-1.434 

36.1 

-0.732 

41.7 

-0.495 

47.2 

-0.495 

52.8 

0.00 

58.3 

0.495 

63.9 

i 

13 

14 

15 

16 

17 

18 

Ordered  Value 
Plotting  Position 

0.495 

69.4 

1.105 

75.0 

1.105 

80.6 

1.211 

86.1 

2.045 

91.7 

4.121 

97.2 

X j ,  Nj  represent  actually  observed  values,  with  no  adjustments  for 
beaker  to  beaker  heterogeneity. 


Q  =  (1-N  /N) 


-1/2 


2  v/Tf~  [arc  sin  -  arc  sin 

j  1 


150 
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The  normal  probability  plot  of  these  points  is  shown  in  Figure  VI. 2. 

A  theoretical  N(0,1)  distribution  line  is  drawn  in  for  reference.  All  but 
the  largest  standardized  values  lie  approximately  on  a  straight  line  with 
slope  slightly  greater  than  that  based  on  the  assumptions  of  no  beaker  to 
beaker  heterogeneity  and  no  outliers  (slope  is  about  1.5  as  compared  with 
1.0  for  the  theoretical  N(0,1)  line).  The  departure  from  the  standard 
normal  distribution  line  is  probably  due  to  the  small  expected  frequencies 
within  each  group  (for  the  most  part  less  than  one)  since  the  EXAX2  compari¬ 
sons  in  Section  III  revealed  no  beaker  to  beaker  variation,  except  possibly 
in  group  5. 

The  largest  standardized  value,  corresponding  to  the  beaker  in  group  5 
with  100  percent  mortality,  lies  somewhat  above  a  straight  line  fitted  to 
the  other  points.  This  suggests  that  it  is  larger  than  what  would  be 
expected  of  the  extreme  of  the  remaining  values  and  thus  may  be  too  large 
to  be  due  just  to  chance;  i.e.  it  may  be  an  outlier. 

The  extreme  point  corresponds  to  Beaker  A  of  group  5.  The  observed 
mortality  rate  there  is  100  percent,  where  the  normal  approximation  is  least 
reasonable.  Thus  before  we  infer  that  the  point  is  in  fact  an  outlier  we 
should  compare  the  observed  mortality  rate  in  this  beaker  with  that  in  its 
companion  beakers  to  determine  if  there  is  any  statistical  evidence  of 
differences.  One  such  comparison  was  made  among  the  three  beakers  in  group  5 
as  part  of  the  EXAX2  analysis  in  Section  III.  There  was  a  significant  dif¬ 
ference  at  the  a  =  0.07  level.  Such  a  significance  level  does  not  offer 
much  statistical  evidence  of  an  outlier,  especially  when  compared  with  what 
might  be  expected  of  the  nost  extreme  significance  level  of  six  independent 
tests,  even  when  nothing  was  going  on.  We  construct  several  more  sensitive 
comparisons  below. 

Consider  the  results  in  group  5. 

Group  5 


Replicate 


A 

B 

C 

Dead 

5 

2 

|  'I 

8 

Live 

0 

3 

n 

7 

5 

5 

15 

Beaker  A  is  the  suspected  outlier.  Compare  its  results  to  those  from  the 
other  two  beakers  pooled. 


5A 

Fourteen  other 
beakers  in  groups  1-5 

Dead 

5=x 

10 

15 

Live 

0 

60 

60 

5 

75 

75 

We  again  carry  out  an  exact  test  of  homogeneity  of  responses  by  means  of 
the  Fisher-Irwin  test.  If  Beaker  5A  in  fact  has  the  same  underlying 
mortality  probability  as  those  in  each  of  the  other  14  beakers  in  groups 
1-5,  the  probability  of  observing  a  table  as  extreme  as  the  one  above  just 
due  to  chance  can  be  calculated  from  the  hypergeometric  distribution  as 


P  (X  >  5) 


Thus  the  approximate  two  tailed  significance  level  is  2(1.74  x  10“^)  = 

3.48  x  10"  .  Now  Beaker  5A  was  not  chosen  a  priori .  Taking  selection 
of  beakers  into  account  we  have,  assuming  homogeneity  of  response  rates 
among  all  beakers  in  groups  1-5, 

-4 

P  (most  extreme  of  15  beakers  more  significant  than  3.48  x  10  level) 

<  15  (3.48  x  10"A)  =  0.0052 

Thus,  under  the  assumptions  of  this  test,  there  is  strong  statistical 
evidence  that  the  response  in  Beaker  5A  is  an  outlier.  However  the  assump¬ 
tion  of  constant  mortality  rates  across  all  five  groups  is  a  bit  too  strong. 
In  particular  Beakers  5B,C  show  a  30  percent  average  mortality  rate  while 
the  beakers  in  groups  1  to  4  have  an  average  12  percent  mortality  rate. 

While  these  rates  are  not  statistically  significantly  different,  they  do 
at  least  suggest  that  the  assumption  of  constant  mortality  rates  across 
the  first  five  groups  is  questionable. 

In  summary,  the  classification  of  the  response  in  Beaker  5A  as  an  outlier 
is  equivocal.  With  the  small  sample  sizes  at  hand  the  outlier  tests  are  not 
significant  (except  under  quite  stringent  assumptions).  Thus  in  the  absence 
of  specific  reasons  to  the  contrary  we  will  consider  the  responses  in  this 
beaker  to  be  valid.  Furthermore  in  the  absence  of  overall  beaker  to  beaker 
heterogeneity  within  groups  we  do  not  make  any  data  adjustments.  We  carry 
out  subsequent  analyses  with  the  data  as  presented. 

C.  OUTLIER  DETECTION  PROCEDURES  APPLIED  TO  REPRODUCTION  RESPONSES 

Offspring  are  counted  on  a  per  beaker  basis.  Thus  in  the  absence  of  a 
theoretical  model  upon  which  to  base  variance  estimates  (analagous  to  the 
binomial  model  for  mortality  data)  or  an  assumption  about  the  magnitude  of 
beaker  to  beaker  variation  (e.g.  that  it  is  no  greater  than  that  for  lengths) 


(?55) 


=  1.74  x  10 
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there  is  no  basis  on  which  to  augment  the  beaker  to  beaker  variation  esti¬ 
mates  with  information  about  within  beaker  variation  (such  as  we  did  for 
mortality  and  length  responses) .  Since  such  assumptions  about  theoretical 
models  or  about  beaker  to  beaker  variation  are  at  best  tenuous,  we  do  not 
attempt  them  here  and  instead  calculate  variability  estimates  based  on 
total  results  within  each  beaker  (i.e.  on  a  per  beaker  basis). 

Outlier  detection  procedures  are  based  on  the  residuals  from  one  way 
analysis  of  variance  models.  Let  Ri-s  denote  the  21  day  cumulative  offspring 
per  survivor  in  the  j-th  beaker  of  the  i-th  group.  (In  Chapman's  test,  with 
individual  daphnids  per  beaker,  attention  is  confined  to  those  individual 
daphnids  which  survived  for  21  days.)  The  one  way  analysis  of  variance  model, 

R, .  =  y  +  a.  +  e . .  i=l,...,I  i  =  1, . . . ,  J 

ij  i  13 

was  specified  where  R^j  denotes  the  total  offspring,  is  the  fixed  group 
effect,  and  eij  is  the  experimental  variation.  It  is  assumed  that  are 
independent  N(0,  a2) .  The  model  was  fitted  to  the  data  using  the  computer 
program  BMDPlR  in  the  BMDP  statistical  computing  system  [7].  The  model 
fits  and  associated  residual  displays  for  the  various  data  sets  are  presented 
below. 

Figures  VI. 3  -  VI. 5  display  the  analysis  of  variance  fit  to  the  21  day 
cumulative  reproduction  data  in  LeBlanc's  Test  A,  the  residuals  plotted  by 
group,  and  a  normal  probability  plot  of  the  ordered  residuals.  The  individual 
responses  are  shown  in  Figure  II. 11.  One  of  the  residuals  in  group  5  stands 
out  from  the  others  in  Figure  VI. 4.  This  residual  does  not  lie  on  a  straight 
line  fitted  by  eye  to  the  remaining  residuals  in  Figure  VI. 5.  It  must  thus 
be  considered  a  potential  outlier.  To  determine  whether  there  is  any 
statistical  evidence  that  this  extreme  observation  is  in  fact  an  outlier 
we  can  test  whether  the  most  extreme  of  18  independent  normally  distributed 
random  variables  with  mean  0  and  standard  deviation  20.64  is  likely  to  exceed 
49.25  in  absolute  value.  (The  extreme  value  is  189  and  thus  corresponds  to 
a  deviation  of  49.25  from  its  group  average.  The  four  responses  correspond¬ 
ing  to  group  7  have  been  deleted  from  the  calculations  below  because  there  is 
virtually  100  percent  mortality  in  group  7  and  so  reproduction  patterns  may 
not  be  comparable  to  those  from  other  groups.  The  standard  deviation  estimate 
is  thus  ((9667.25  -  (49. 25) 2) /17)  1-/2  =  20.64.  We  account  for  the  correlations 
among  the  residuals  from  the  same  group  by  treating  the  J=4  correlated 
residuals  as  if  they  were  J-l=3  independent  normal  deviates).  Thus 

P  [most  extreme  of  18  independent  random  deviates  is  greater  than  49.25  in 

absolute  value]  =  1  -  [p  (-49.25  <  X  <  49.25)]^  =  1  -  [2$  ^20*64)  ~  ^  ^ 

IO  IO 

=  1  -  [2$  (2.386)  -  1]  =  1  -  (0.9830) i0  =  0.27 

There  is  thus  no  statistical  evidence  that  this  extreme  value  is  greater  than 
what  may  be  expected  just  due  to  random  variation. 


Figures  VI. 6  -  VI. 8  display  the  analysis  of  variance  fit  to  the  21  day 
cumulative  reproduction  data  in  LeBlanc's  Test  B,  the  residuals  plotted  by 
group,  and  a  normal  probability  plot  of  the  ordered  residuals.  The  individ¬ 
ual  responses  are  shown  in  Figure  11.13.  One  of  the  residuals  in  group  3 
stands  out  from  the  others  in  Figure  VI. 7.  This  residual  does  not  lie  on  a 
straight  line  fitted  by  eye  to  the  remaining  residuals  in  the  normal  prob¬ 
ability  plot  in  Figure  VI. 8.  It  is  thus  a  possible  outlier.  To  determine 
whether  there  is  any  statistical  evidence  that  this  extreme  observation  is 
in  fact  an  outlier  we  can  test  whether  the  most  extreme  of  7(4-1)  =  21 
independent  normally  distributed  random  variables  with  mean  0  and  standard 
deviation  18.56  (i.e.  [(9697.5  -  532)/20]1/2)  is  likely  to  exceed  53  in 
absolute  value.  (The  extreme  value  is  169  and  thus  corresponds  to  a  devia¬ 
tion  of  53  from  its  group  average.)  Thus 

P  [most  extreme  of  21  independent  random  normal  deviates  is  greater  than 

53.0  in  absolute  value]  =  1  -  [P  (-53 <  X <  53) ] 21  =  1  -  [2$  (53/18.56)  -  1] 

=  1  -  [2$  (2.856)  -  l)21  =  1  -  (0. 9957) 21  =  0.09 

This  is  at  most  marginally  statistically  significant  and  the  situation  is 
borderline.  We  will  retain  this  observation  in  future  analyses. 

It  is  interesting  to  note  the  great  similarity  in  reproduction  results 
in  Tests  A  and  B.  In  each  case  the  potential  outlying  value  was  on  the 
high  side.  Perhaps  the  characteristics  of  these  two  daphnids  and  conditions 
in  these  beakers  should  be  noted  and  repeated  in  future  tests,  to  the  extent 
possible,  so  as  to  obtain  increased  productivity. 

Figures  VI. 9  -  VI. 11  display  the  analysis  of  variance  fit  to  the  21  day 
cumulative  reproduction  data  in  Adams'  selenium  test,  the  residuals  plotted 
by  group,  and  a  normal  probability  plot  of  the  ordered  residuals.  The 
individual  responses  are  shown  in  Figure  11.15.  Since  groups  5-8  have  100 
percent  mortality  we  delete  the  residuals  from  these  groups  from  the  normal 
probability  plot.  None  of  the  residuals  in  groups  1-5  stand  out  from  the 
rest.  There  is  thus  no  suggestion  of  outlying  responses. 

Figures  VI. 12  -  VI. 14  display  the  analysis  of  variance  fit  to  the  21  day 
cumulative  reproduction  data  in  Chapman's  beryllium  test,  the  residuals 
plotted  by  group,  and  a  normal  probability  plot  of  the  ordered  residuals. 

The  individual  responses  are  shown  in  Figure  11.17.  Since  in  this  test 
the  daphnids  are  housed  individually,  the  reproduction  figures  pertain  to 
individual  daphnids.  Comparisons  are  restricted  to  those  daphnids  that 
survived  to  the  end  of  the  test.  None  of  the  residuals  stand  out  from  the 
group.  There  is  thus  no  suggestion  of  outlying  responses. 

Figures  VI. 15  -  VI. 17  display  the  analysis  of  variance  fit  to  the  21  day 
cumulative  reproduction  data  in  Goulden's  isophorone  test,  the  residuals 
plotted  by  group,  and  a  normal  probability  plot  of  the  ordered  residuals. 

The  individual  responses  are  shown  in  Figure  11.22.  Attention  is  confined 
to  the  individually  housed  daphnids  that  survived  to  the  end  of  the  test. 
Since  just  one  daphnid  in  group  6  survived  to  the  end  of  he  test  (and 


a 
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produced  no  offspring),  this  response  is  deleted  from  the  calculations  below 
because  the  fertility  patterns  in  this  highest  concentration  group  may  not 
be  comparable  with  those  from  the  other  groups.  In  groups  1-5,  just  one 
individually  housed  daphnid  died.  One  of  the  residuals  in  group  1  stands 
out  from  the  others  in  Figure  VI. 16.  This  residual  does  not  lie  on  a 
straight  line  fitted  by  eye  to  the  remaining  residusls  in  Figure  VI. 17.  It 
is  thus  a  possible  outlier.  (This  point  corresponds  to  a  daphnid  that 
produced  135  offspring,  while  the  average  production  in  the  control  group 
was  77.29.)  To  determine  whether  there  is  any  statistical  evidence  that 
this  extreme  point  is  in  fact  an  outlier  we  can  test  whether  the  most  extreme 
of  5(7-1)  -  1  =  29  independent  normally  distributed  random  variables  with 
mean  0  and  standard  deviation  14.38  (i.e.  [(9120.548  -  57 . 71^) / 2 8 ] ^^)  is 
likely  to  exceed  57.71  in  absolute  value.  (135  -  77.29  =  57.71).  Thus 

P  [most  extreme  of  29  independent  random  deviates  is  greater  than  57.71  in 

29  29 

absolute  value]  =  1  -  [P  (-57.71 <  X  <  47.71) ]  =  1  -  [2$  (57.71/14.38)  -  1] 

29  29 

=  1  -  [2<t>  (4.013)  -  1]  =  1  -  (0.9999)  =  0.002 

(Note  that  it  might  be  argued  that  14.38  is  an  underestimate  of  variability 
since  it  excludes  the  extreme  residual,  57.71.  If  we  repeat  the  above 
probability  calculation  using  the  standard  deviation  estimate  17.73  (from 
Figure  VI. 15),  the  extreme  point  is  significant  at  the  a  =  0.03  level). 

There  is  thus  statistical  evidence  that  this  point  is  in  excess  of  what  is 
to  be  expected  just  due  to  random  variation. 

The  above  calculations,  coupled  with  the  appearances  of  Figures  11.22, 

VI. 16  and  VI. 17  suggest  that  this  point  be  deleted  from  subsequent  compari¬ 
sons.  This  decision  may  well  impact  on  whether  groups  4  and  5  are  considered 
to  differ  significantly  from  the  control  group.  As  such,  the  decision  as 
to  whether  to  include  or  exclude  this  point  from  subsequent  comparisons 
should  also  be  based  on  biological  judgement  and  knowledge  of  experimental 
details.  If  the  extreme  result  represents  normal  biological  variation  then 
perhaps  the  response  should  be  considered  with  the  others.  Outlier  detection 
procedures  are  merely  screening  devices  to  direct  attention  to  those  places 
where  biological  judgement  should  be  applied. 

For  the  sake  of  illustration  we  have  chosen  to  exclude  this  observation 
from  subsequent  comparisons.  In  actual  statistical  analyses  to  support 
regulatory  applications  or  regulatory  decisions  the  subsequent  analyses 
might  be  carried  out  both  with  this  point  in  and  out.  Any  biologically 
important  differences  in  analysis  results  would  need  to  be  resolved  on 
biological  grounds. 
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D.  OUTLIER  DETECTION  PROCEDURES  APPLIED  TO  LENGTH  RESPONSES 

Lengths  are  measured  at  periodic  intervals  on  individuals  daphnids.  In 
the  data  sets  discussed  in  this  report,  length  determinations  were  made  by 
LeBlanc  (in  Tests  A  and  B)  on  days  7  and  21  and  by  Chapman  (in  his  test  on 
beryllium)  on  day  21.  In  LeBlanc’ s  tests  we  can  think  of  outlying  beaker 
averages  relative  to  their  group  averages  as  well  as  outlying  individual 
lengths  relative  to  their  beaker  averages.  In  Chapman's  test  each  beaker 
contains  just  a  single  daphnid  and  so  we  need  only  consider  outlying  individ¬ 
ual  lengths  around  their  group  means. 

We  first  consider  outlier  detection  procedures  for  the  beaker  averages 
about  their  group  means  in  LeBlanc ’s  Tests  A  and  B.  From  the  discussion  in 
Subsection  V.D  we  associate  with  the  beaker  averages  estimates  oi  variability 
based  on  the  mean  squares  for  beakers  within  group  5,  with  degrees  of  freedom 
I(Jx-l)  as  discussed  there.  Let  Nj  denote  the  sample  size  within  the  j-th 
beaker  of  i-th  group,  N  =  Ej  Nj ,  and  let  a?,  denote  the  components  of 
variance  due  to  beakers  and  daphnids  with  beakers  respectively.  Let  Xj 
denote  the  average  length  within  the  j-th  beaker  and  5T  =  Ej  Nj  Xj/N  denote 
the  (weighted)  average  daphnid  length  within  the  i-th  group.  The  standard 
error  of  Xj  is  [ (a£  +  Nj  a^)/Nj]^/^.  We  approximate  cr£  +  Nj  by  the  mean 
square  for  beakers  within  groups,  MSI,  as  discussed  in  Subsection  V.D.  (This 
approximation  would  be  exact  if  all  the  sample  sizes,  N j ,  were  equal  across 
beakers  and  across  groups.)  The  values  of  MSI  were  calculated  in  Subsection 
V.D  to  be 


Test  A:  MSI  =  0.2722  with  98  degrees  of  freedom 

Test  B:  MSI  =  0.2707  with  141  degrees  of  freedom 

The  residuals  of  the  beaker  averages  about  their  group  averages  can  be 
calculated  from  Figures  IV. 2  -  IV. 7  for  Test  A  and  from  Figures  IV. 9  -  IV. 15 
for  Test  B.  The  standard  errors  of  the  residuals  are  approximated  by 
[(1  -  Nj/N)  MSI/Njjl/2.  (This  approximation  would  be  exact  if  all  the  Nj's 
were  equal  across  beakers  and  across  groups.)  Tables  VI . 1  and  VI. 2  contain 
the  ordered  residuals  multipled  by  the  factors  Nj^'^  (1  -  Nj/N)"l/2>  so  as 
to  have  approximately  constant  variance.  These  tables  also  contain  group 
and  beaker  identification  and  plotting  positions  appropriate  for  preparing 
normal  probability  plots.  The  normal  probability  plots  of  these  standard¬ 
ized  residuals  appear  in  Figures  VI. 18  and  VI. 19.  The  reference  lines  on 
these  plots  correspond  to  normal  distributions  with  mean  0  and  with 
variance  MSI.  In  both  tests  the  beaker  averages  conform  nicely  to  the 
reference  lines.  Thus  there  is  no  suggestion  of  outlying  average  beaker 
lengths  within  any  treatment  or  control  group. 

We  next  consider  outlier  detection  procedures  for  individual  (21  day) 
beaker  lengths.  In  LeBlanc's  tests  this  involves  studying  the  deviations 
of  individual  lengths  within  each  beaker  about  their  beaker  averages.  In 
Chapman's  test  this  involves  studying  the  deviations  of  individual  daphnid 
lengths  about  their  group  averages. 

The  approach  is  much  like  that  used  for  outlier  detection  with  repro¬ 
duction  responses.  Namely  the  outlier  detection  procedures  are  based  on 
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TABLE  VI. 1.  LE  BLANC  TEST  A  -  ORDERED  STANDARDIZED  RESIDUALS  OF  BEAKER 
AVERAGE  LENGTHS  FROM  OVERALL  GROUP  AVERAGE  LENGTHS-TO 
DETECT  OUTYLYING  BEAKER  AVERAGES  (GROUPS  1-6  ONLY) 


TABLE  VI. 2.  LE  BLANC  TEST  B  -  ORDERED  STANDARDIZED  RESIDUALS  OF 
BEAKER  AVERAGE  LENGTHS  FROM  OVERALL  GROUP  AVERAGE 
LENGTHS-TO  DETECT  OUTLYING  BEAKER  AVERAGES 


the  residuals  from  one  way  analysis  of  variance  models.  For  LeBlanc's 
Tests  A  and  B  consider  each  beaker  (with  one  or  more  survivors  at  21  days) 
as  a  separate  classification,  irrespective  of  group.  Thus  for  Test  A  there 
are  25  classifications  while  for  Test  B  there  are  28  classifications.  Let 
Ljk  denote  the  21  day  length  of  the  k-th  surviving  daphnid  in  the  j-th  beaker. 
For  Chapman's  test,  with  just  one  daphnid  per  beaker,  let  Ljk  denote  the 
21  day  length  of  the  k-th  surviving  daphnid  in  the  j-th  (treatment  or 
control  group.  The  one  way  analysis  of  variance  model 

jk  3  jk  j  ,  »  3 

was  specified  where  Lj^  denotes  the  21  day  length,  aj  is  the  fixed  beaker 
or  group  effect,  and  ej^  is  the  experimental  random  variation.  It  is 
assumed  that  the  Cj^  are  independent  N(0,  o|) .  The  model  was  fitted  to 
the  data  using  the  computer  program  BMDPIR  in  the  BMDP  statistical  computing 
system  [7].  The  model  fits  and  associated  residual  displays  are  presented 
below. 

Figures  VI. 20  -  VI. 22  display  the  analysis  of  variance  fit  to  the  21 
day  lengths  of  survivors  in  LeBlanc's  Test  A,  the  residuals  plotted  by 
classif ication,  and  a  normal  probability  plot  of  the  ordered  residuals. 

None  of  the  residuals  appear  to  stand  apart  from  the  others  in  Figure  VI. 21. 

A  straight  line  fitted  by  eye  accomodates  all  the  residuals  in  Figure  VI. 22. 
The  residuals  thus  appear  to  follow  an  approximate  normal  distribution  and 
there  is  no  suggestion  of  any  outlying  lengths. 

Figure  VI. 23  -  VI. 25  display  the  analysis  of  variance  fit  to  the  21  day 
lengths  of  survivors  in  LeBlanc's  Test  B,  the  residuals  plotted  by  classi¬ 
fication,  and  a  normal  probability  plot  of  the  ordered  residuals.  Again, 
none  of  the  residuals  stand  apart  from  the  others  in  Figure  VI. 24  nor 
deviate  from  the  straight  line  in  Figure  VI. 25.  Thus  there  is  no  suggestion 
of  any  outlying  lengths  in  Test  B, 

Figures  VI. 26  -  VI. 28  display  the  analysis  of  variance  fit  to  the  21 
day  lengths  of  survivors  in  Chapman's  beryllium  test,  the  residuals  plotted 
by  group,  and  a  normal  probability  plot  of  the  ordered  residuals.  Four  of 
the  residuals  (one  from  group  5,  one  from  group  6,  and  two  from  group  7) 
appear  to  be  removed  from  the  others,  on  the  low  side,  in  Figure  VI. 27. 

These  residuals  are  also  removed  from  a  straight  line  fitted  to  the  remain¬ 
ing  ones  in  Figure  VI. 28.  There  is  thus  a  definite  suggestion  that  the  four 
daphnids  corresponding  to  these  cases  may  be  outliers  on  the  low  side.  That 
is,  those  daphnids  may  be  dwarfed,  relative  to  their  group  averages,  more 
than  could  be  expected  of  the  most  extreme  of  63  deviations  just  due  to 
chance.  To  determine  whether  there  is  any  statistical  evidence  that  the 
most  extreme  of  these  residuals  is  in  fact  an  outlier  we  can  test  whether 
the  most  extreme  of  63  -  8  =  55  independently  normally  distributed  random 
variables  with  mean  0  and  standard  deviation  0.275  (as  determined  from 
the  straight  line  drawn  to  the  majority  of  the  values  in  Figure  VI. 28)  is 
likely  to  exceed  0.8167  in  absolute  value.  (The  extreme  length  is  3.0000  mm 
and  the  group  average  is  3.8167  mm.)  Thus 
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P  [most  extreme  of  55  independent  random  normal  deviates  is  greater  than 
0.8167  in  absolute  value]  =  1  -  [P  (-0.8167 < X < 0.8167) ]55 

=  1  -  [2$  (0.8167/0.275)  -  l)55 
=  1  -  [2$  (2.9698)  -  l]55  =  1  -  (0.9970)55 
=  1  -  0.85  =  0.15 

The  degree  of  statistical  evidence  is  marginal  and  so  the  situation  is 
borderline.  Let's  consider  the  second  most  extreme  residual.  This  residual 
has  value  -0.7900. 

P  [second  most  extreme  of  55  independent  random  normal  deviates  is  greater 
than  0.79  in  absolute  value] 

=  1  -  [P  (-0.79  <  X  <  0.79) }55 

-  55  [P  (-0.79  <  X  <  0.79]54  [2P  (X  >  0.79)] 

=  1  -  [2$  (0.79/0.275)  -  l]55 

-  55  [2$  (0.79/0.275)  -  l]54  [2  (1  -  $(0.79/0.275))]  =  0.02 

Thus  the  second  extreme  residual  is  more  highly  statistically  significant. 
This  suggests  that  there  may  be  some  physical  reason  underlying  the  extreme 
values  on  the  low  side.  Basic  records  should  be  checked  by  the  investigator 
to  try  to  identify  a  reason.  The  proper  action  would  depend  on  the  physical 
explanation.  In  the  absence  of  such  information  we  choose  to  regard  these 
observations  as  valid  and  carry  out  subsequent  analyses  incorporating  them. 
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Figure  VI. 1.  Normal  probability  plot  of  standardized  values  based  on  LeBlanc  Test 
21  day  mortality — to  detect  outlying  responses 


Normal  probability  plot  of  standardized  values  based  on  Goulden  Isophorone 
test  -  21  day  mortality — to  detect  outlying  responses 
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Analysis  of  variance  fit  of  21  day  cumulative  reproduction 
on  group.  LeBlanc  test  A. 


Residuals  from  analysis  of  variance  fit  versus  group.  LeBlanc  test 


Normal  probability  plot  of  residuals  from  analysis  of  variance 
fit.  LeBlanc  test  A. 


group.  LeBLanc  tes 


NORMAL  PROBABILITY  PLOT  OF  RESIDUALS 
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Figure  VI. 9.  Analysis  of  variance  fit  of  21  day  cumulative  reproduction 
on  group.  Adams'  selenium  test. 


10.5 


gure  VI. 11.  Normal  probability  plot  of  residuals  from  analysis  of  variance  fit 
Adams  selenium  test.  Groups  1-4  only. 


Figure  VI. 12.  Analysis  of  variance  fit  of  21  day  cumulative  reproduction  on 
group.  Chapman  beryllium  test.  Surviving  daphnids  only. 


N3RAAI.  PHrittABILITY  PL3f  3F  RESIDUALS 


probability  plot  of  residuals  from  analysis  of  variance 
Chapman  beryllium  test.  Surviving  daphnids  only. 


Figure  VI. 15.  Analysis  of  variance  fit  of  21  day  cumulative  reproduction  on  group. 

Goulden  isophorone  test.  Surviving,  individually  housed  daphnids  only. 


analysis  of  variance  fit  versus  group.  Goulden  isophorone 
ng,  Individually  housed  daphnids  only. 


Figure  VI. 17.  Normal  probability  plot  of  residuals  from  analysis  of  variance  fit. 

Goulden  isophorone  test.  Surviving,  individually  housed  daphnids  only. 
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Figure  VI. 18.  Normal  probability  plot  of  standardized  residuals  -  LeBlanc  test 
day  average  lengths  -  to  detect  outlying  beaker  averages 
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Figure  VI. 19.  Normal  probability  plot  of  standardized  residuals  -  LeBlanc  Test  B-21 
day  average  lengths  -  to  detect  outlying  beaker  averages 
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Figure  VI. 20.  Analysis  of  variance  fit  of  21  day  lengths  of  surviving  daphnids 
on  beaker  classification.  LeBlanc  test  A. 


REslOUA. 


.22.  Normal  probability  plot  of 
variance  fit.  I.eBlanc  test 
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Figure  VI. 23.  Analysis  of  variance  fit  of  21  day  lengths  of  surviving  daphnids 
on  beaker  classification.  LeBlanc  test  B. 
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.25.  Normal  probability  plot  of  residuals  from  analysis  of  varianc 


group.  Chapman  beryllium  test. 
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Figure  VI. 27.  Residuals  from  analysis  of  variance  fit  versus  group 
Chapman  beryllium  test. 
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Normal  probability  plot  of  residuals  from  analysis  of 
va: iance  fit.  Chapman  beryllium  test. 


VII.  COMPARISON  OF  WATER  CONTROL  AND  SOLVENT  CONTROL  GROUPS 


A.  INTRODUCTION  AND  BACKGROUND 

The  toxicant  under  study  may  not  be  water  soluble  or  may  be  soluble  only 
at  concentrations  lower  than  those  used  in  the  test.  In  such  a  situation 
the  toxicant  must  be  dissolved  in  a  solvent  to  prevent  it  from  precipitating 
out  of  the  water  during  the  course  of  exposure.  The  substance  under  study 
is  thus  a  combination  of  the  toxicant  of  interest  and  the  solvent  used. 

The  usual  method  of  procedure  is  to  prepare  a  relatively  highly  concentrated 
toxicant-solvent  solution  and  utilize  successive  dilutions  with  water  to 
arrive  at  the  appropriate  test  concentrations.  Thus  the  solvent  is  diluted 
along  with  the  toxicant  among  the  various  test  groups. 

The  utilization  and  successive  dilution  of  solvents  complicates  the 
interpretation  of  toxic  responses  observed  in  a  test.  Since  the  solvent 
and  toxicant  are  paired  and  diluted  together,  there  is  no  way  to  sort  out 
whether  the  observed  effects  are  due  to  the  toxicant,  to  the  solvents,  or 
to  the  pair.  There  is  really  no  way  to  infer,  solely  on  the  basis  of  tests 
utilizing  solvents,  how  the  toxicant  would  act  in  the  field  in  the  absence 
of  a  solvent.  (Note  that  concentrations  above  solubility  levels  can  occur 
in  the  field  due  to  effluents  or  spills.) 

A  partial  solution  to  this  dilema  can  be  obtained  by  studying  the  effects 
of  the  solvent  alone.  If  the  solvent  by  itself  produces  no  toxic  responses 
at  the  concentrations  utilized  in  the  test,  the  assumption  is  made  that  any 
toxic  effects  observed  in  the  test  can  be  attributed  to  the  toxicant.  This 
assumption  may  or  may  not  be  valid  in  any  given  situation.  The  effects  of 
the  toxicant  and  the  solvent  may  superimpose  upon  one  another  or  even  inter¬ 
act.  The  determination  of  the  presence  or  extent  of  such  joint  effects 
would  require  special  studies  and  special  statistical  analyses. 

Since  data  from  tests  to  study  joint  solvent-toxicant  effects  are  not 
available  and  since  exploration  of  this  question  is  beyond  the  scope  of  this 
project,  we  content  ourselves  with  a  much  more  limited  treatment  of  the 
problem.  Namely,  preliminary  statistical  tests  are  carried  out  to  compare 
the  average  survival,  length,  and  reproduction  responses  between  the  solvent 
and  water  control  groups.  If  no  statistically  significant  differences  are 
found  then  we  act  as  if  there  are  no  differences  and  we  pool  data  across 
control  groups.  The  combined  responses  are  used  for  comparison  with  the 
treatment  group  responses.  If  statistically  significant  differences  are 
found  then  treatment  group  responses  are  compared  either  just  to  the  solvent 
control  group  responses  or  else  separately  to  both  the  water  control  group 
and  to  the  solvent  control  group  responses.  Although  this  is  a  commonly 
used  approach  to  dealing  with  this  question,  it  leaves  much  to  be  desired 
and  raises  a  number  of  important  conceptual  issues. 
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First  of  all,  the  presence  or  absence  of  statistically  significant 
differences  in  responses  between  water  and  solvent  control  groups  is  not 
the  same  thing  as  the  presence  or  absence  of  biologically  significant  dif¬ 
ferences  in  responses  between  these  groups.  Statistical  significance  is  a 
function  of  sample  size  as  well  as  magnitude  of  effect.  Thus  the  absence 
of  a  statistically  significant  difference  between  those  groups  does  not 
imply  that  no  difference  exists.  A  biologically  important  difference  might 
be  revealed  with  use  of  greater  numbers  of  daphnids  or  beakers. 

Secondly,  if  the  solvent  is  shown  to  have  an  effect,  as  evidenced  by 
(statistically  significant)  differences  between  the  solvent  control  group 
and  water  control  responses,  then  even  greater  conceptual  issues  arise. 

Namely  the  toxicant  effects  and  the  solvent  effects  are  confounded  and 
there  is  no  way  to  sort  them  out.  The  interpretational  problem  is  further 
complicated  if  responses  in  certain  treatment  groups  are  significantly 
(biologically  or  statistically)  different  from  one  control  group  response 
but  not  from  the  other.  Which  comparison  should  be  utilized?  We  side  step 
the  problem  in  this  report  by  comparing  both  to  the  solvent  and  the  water 
control  groups  if  differences  exist.  We  then  note  any  differences  in 
conclusions.  However  this  does  not  answer  the  fundamental  question  of 
which  comparison  is  more  appropriate  for  regulatory  or  reporting  purposes. 
That  question  cannot  be  answered  solely  on  statistical  grounds  based  on 
the  test  data  available.  Biological  judgement  must  be  used  and/or  additional 
tests  must  be  carried  out  to  study  the  effects  of  the  solvent  by  itself.  For 
our  purposes  we  simply  note  the  differences  between  the  two  control  group 
responses,  carry  out  separate  comparisons  with  each  control  group,  and 
remark  that  the  results  of  the  test  are  somewhat  ambiguous.  Any  further 
interpretation  would  require  additional  testing  and/or  biological  judgement. 

Among  the  data  sets  considered  in  this  report,  LeBlanc's  Tests  A  and  B 
and  Chapman's  Beryllium  Test  utilize  both  solvent  and  water  control  groups. 

As  will  be  seen  below,  significant  differences  between  the  solvent  and 
water  control  groups  in  LeBlanc's  Test  A  arise  with  respect  to  the  survival, 
reproduction,  and  length  responses.  Subsections  B,  C  and  D  below  pertain  to 
comparisons  between  the  control  groups  in  LeBlanc's  Tests  A  and  B  and  in 
Chapman's  test  respectively. 

B.  LE  BLANC  TEST  A 

Survival 

The  mortality  rates  in  the  water  control  group  (group  1)  and  in  the 
solvent  control  group  (group  2)  are  compared  by  carrying  out  a  2  by  2  contin¬ 
gency  table  test  for  homogeneity  utilizing  the  BMDP  program,  BMDP1F.  The 
Pearson  chi  square  test,  with  and  without  the  continuity  correction,  and 
Fisher's  exact  test  are  reported.  Two  tailed  tests  are  used.  The  output 
from  these  analyses  is  shown  in  Figure  VII. 1. 

For  the  purpose  of  simplicity,  analyses  were  first  carried  out  based  on 
the  original  (i.e.  unadjusted  for  beaker  to  beaker  heterogeneity)  responses. 
If  these  comparisons  show  no  differences,  then  comparisons  based  on  the 


adjusted  responses  surely  would  not.  If  significant  differences  are  found, 
then  we  must  redo  the  analyses  after  adjusting  for  beaker  to  beaker  hetero¬ 
geneity  (based  on  the  adjustments  in  Section  V.) 

We  see  from  Figure  VII. 1  that  there  was  15  percent  mortality  in  the 
water  control  group  (12  of  80)  and  just  3.75  percent  mortality  in  the  solvent 
control  group  (3  of  80).  Under  the  hypothesis  of  homogeneous  mortality  rates 
in  the  two  groups,  the  minimum  expected  cell  frequency  is  7.5,  thus  suggesting 
that  asymptotic  theory  is  reasonable.  The  uncorrected  chi  square  statistic 
is  significant  at  the  a  =  0.0146  level  while  the  corrected  chi  square  statis¬ 
tic  is  significant  at  the  a  =  0.0300  level,  in  good  agreement  with  Fisher's 
exact  test.  There  is  thus  statistical  evidence  of  differences  in  mortality 
rates  between  the  two  control  groups,  the  solvent  control  group  having  a 
more  favorable  rate  than  the  water  control  group. 

The  above  tests  did  not  account  for  the  presence  of  beaker  to  beaker 
heterogeneity,  which  was  shown  in  Section  V  to  exist.  From  Table  V.3  we 
see  that  the  effective  sample  sizes  and  numbers  of  responses  in  groups  1 
and  2,  after  adjustment  for  the  effects  of  beaker  to  beaker  heterogeneity, 
are: 


at  the  a  =  0.041  level.  The  Yates  corrected  chi  square  statistic  is 
xj:  =  3.14,  which  is  significant  at  the  a  =  0.081  level.  Since  we  associate 
30  d.f.  for  error  with  each  group  (see  Table  V.3),  the  "chi  square"  statistics 
are  compared  to  the  percentiles  of  an  F-distr ibution  with  degrees  of  freedom 
1  and  60. 

We  thus  conclude  that  there  is  statistical  evidence  of  differences  in 
mortality  rates  between  the  water  control  and  solvent  control  groups.  The 
solvent  control  group  has  lower  mortality  than  the  water  control  group. 

Reproduction 

The  21  day  cumulative  offspring  per  surviving  adult  in  the  water  control 
group  (group  1)  and  in  the  solvent  control  group  (group  2)  are  compared 
either  by  carrying  out  a  two  sample,  two  tailed  t-test  or  equivalently  by 
a  one  way  analysis  of  variance  with  two  groups.  Output  from  the  latter 
analysis,  utilizing  the  SPSS  ANOVA  procedure,  is  shown  in  Figure  VII. 2. 

Recall  that  these  responses  are  obtained  for  each  beaker  by  accumulating 
daily  the  total  number  of  offspring  produced  on  that  day  divided  by  the 
number  of  daphnids  alive  on  that  dav. 
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The  analysis  in  Figure  VII. 2  is  carried  out  based  on  an  error  yardstick 
calculated  only  from  the  results  in  the  two  control  groups  (and  thus  having 
6  d.f.).  On  the  basis  of  this  analysis  we  see  that  there  is  very  strong 
statistical  evidence  of  differences  in  production  rates  in  these  two  groups. 
The  production  rate  in  the  solvent  control  group  is  about  50  percent  greater 
than  that  in  the  water  control  group.  Thus  unless  there  is  some  systematic 
difference  between  the  water  control  and  solvent  control  daphnids  either 
with  respect  to  biological  hardiness  or  experimental  handling,  it  appears 
as  if  the  solvent  is  associated  with  increased  production. 

Figure  VI. 3  shows  the  results  of  an  analysis  of  variance  fit  to  all  the 
groups.  On  the  basis  of  this  fit  an  error  mean  square  of  460.345  with  21  d.f. 
is  estimated.  This  is  somewhat  greater  than  the  mean  square  of  278.1667  with 
6  d.f.  estimated  from  the  firt  in  Figure  VII. 2  (although  not  significantly 
greater  at  a  =  0.10).  An  additional  test  was  carried  out  using  the  larger 
error  mean  square.  The  difference  between  the  production  rates  in  the  solvent 
and  the  water  control  groups  is  still  very  strongly  significant  (a  =  0.000). 

Length 

The  21  day  lengths  of  the  surviving  adults  In  the  water  control  group 
and  in  the  solvent  control  group  are  compared  either  by  carrying  out  a  two 
sample,  two  tailed  t-test  or  equivalently  by  a  one  way  analysis  of  variance 
with  two  groups.  Output  from  the  latter  analysis,  utilizing  the  SPSS 
ANOVA  procedure,  is  shown  in  Figure  VII. 3. 

A  basic  difference  between  the  length  and  reproduction  responses,  at 
least  for  multiply  housed  daphnids,  is  that  lengths  are  measured  on  a  per 
daphnid  basis  while  reproduction  is  determined  on  a  per  beaker  basis. 

Thus  comparisons  of  lengths  could  be  carried  out  on  a  per  beaker  basis, 
with  one  degree  of  freedom  per  beaker,  or  on  a  per  daphnid  basis,  with 
one  degree  of  freedom  per  daphnid  (there  were  a  total  of  145  surviving 
daphnids  in  the  two  control  groups).  It  was  shown  in  Subsection  IV. B 
that  there  is  strong  statistical  evidence  of  beaker  to  beaker  variation 
in  lengths.  Thus  a  per  daphnid  analysis  would  not  be  appropriate  without 
somehow  adjusting  for  the  beaker  to  beaker  heterogeneity.  A  method  for 
carrying  out  such  adjustments  is  discussed  in  Subsection  V.D. 

The  analysis  in  Figure  VII. 3  is  carried  out  on  a  per  beaker  basis, 
based  on  an  error  yardstick  calculated  only  from  the  results  in  the  two 
control  groups  (and  thus  having  6  d.f.).  On  the  basis  of  this  analysis 
we  see  that  there  is  very  strong  statistical  evidence  of  differences  in 
average  lengths  in  these  two  groups.  The  average  length  in  the  solvent 
control  group  is  greater  than  that  in  the  water  control  group.  Thus  un¬ 
less  there  is  some  systematic  difference  in  hardiness  or  in  handling 
between  the  water  control  and  the  solvent  control  daphnids,  it  appears 
as  if  solvent  is  associated  with  increased  lengths. 


A  suggested  variability  estimate,  for  comparisons  of  average  lengths 
among  groups,  is  discussed  in  Subsection  V.D.  The  estimate  is  based  on 
the  variability  among  beaker  averages  within  groups  (actually  the  mean 


square  between  beakers  within  groups,  as  discussed  in  Subsection  IV. B)  with 
degrees  of  freedom  based  on  pooling  information  from  the  variability  among 
beakers  within  groups  and  the  variability  among  daphnids  within  beakers. 

The  suggested  variability  estimate  is  MSI  =  0.2722  with  98  d.f.  This  vari¬ 
ability  estimate  is  normalized  to  be  on  a  per  daphnid  basis.  In  order  to 
apply  to  average  lengths  within  beakers  we  must  divide  it  by  the  sample  size 
within  beakers.  The  average  number  of  surviving  adults  per  beaker  in  the 
two  groups  is  145/8  =  18.125.  Thus  the  error  estimate,  normalized  to  a  per 
average  basis,  is  0.2722/18.125  =  0.0150.  (This  compares  with  0.0138  in 
Figure  VII. 3.)  An  additional  test  was  carried  out  using  the  alternative 
error  mean  square  with  98  d.f.  Using  this  error  yardstick  yields  the  same 
conclusions  as  in  Figure  VII. 3,  namely  that  there  is  very  strong  statistical 
evidence  (a  =  0.0005)  of  differences  in  average  lengths  between  the  two 
control  groups. 

C.  LE  BLANC  TEST  B 

The  comparisons  between  the  solvent  and  water  control  groups  are  carried 
out  in  the  same  manner  as  those  discussed  in  the  previous  subsection  for 
LeBlanc  Test  A.  The  discussion  here  will  thus  be  less  detailed. 

Survival 


The  mortality  responses  in  the  water  control  and  in  the  solvent  control 
groups  are  compared  by  carrying  out  a  2  by  2  contingency  table  test  for  hetero¬ 
geneity.  Two  tailed  tests  are  used.  The  output  is  shown  in  Figure  VII. 4. 

The  analysis  in  Figure  VII. 4  is  based  on  the  original  responses,  unad¬ 
justed  for  beaker  to  beaker  heterogeneity.  The  conclusion,  based  on  the 
chi  square  test  with  or  without  correction  or  based  on  Fisher's  exact  test 
is  that  there  is  no  statistical  evidence  of  differences  in  mortality  rates 
between  the  solvent  and  water  control  groups  (a  =  0.349).  Since  adjusting 
for  beaker  to  beaker  heterogeneity  would  only  increase  the  observed  signifi¬ 
cance  level,  it  will  not  change  our  conclusions.  This  adjustment  is  thus 
omitted . 

Reproduction 

The  21  day  cumulative  offspring  per  surviving  adult  in  the  water  control 
and  solvent  control  groups  are  compared  in  the  same  manner  as  was  done  for 
the  Test  A  responses,  utilizing  either  a  two  tailed  t-test  or  a  one  way 
analysis  of  variance  test.  The  output  appears  in  Figure  VII. 5.  The 
analysis  in  Figure  VII. 5  is  carried  out  based  on  an  error  yardstick  cal¬ 
culated  only  from  the  results  in  the  two  control  groups  (and  thus  having 
6  d.f.).  On  the  basis  of  this  analysis  we  conclude  that  there  is  no 
statistical  evidence  of  differences  in  production  rates  in  the  two  groups. 

Figure  VI. 6  shows  the  results  of  an  analysis  of  variance  fit  to  all 
the  groups.  On  the  basis  of  this  fit  an  error  mean  square  of  461.786  with 
21  d.f.  is  estimated.  If  we  recalculate  the  F-ratio  using  this  error 
yardstick  we  obtain 


F  =  40.50/461.786  =  0.088 


When  compared  to  an  F-distribution  with  degrees  of  freedom  1  and  21,  the 
observed  significance  level  is  a  =  0.770.  Thus,  the  conclusions  are 
unchanged . 


Length 

The  21  day  lengths  of  the  surviving  adults  in  the  water  control  and 
solvent  control  groups  are  compared  by  either  carrying  out  a  two  sample, 
two  tailed  t-test  or  equivalently  by  a  one  way  analysis  of  variance  with 
two  groups.  Output  from  the  latter  analysis,  utilizing  the  SPSS  ANOVA 
procedure,  is  shown  in  Figure  VII. 6. 

It  was  shown  in  Subsection  IV. B  that  there  is  strong  statistical  evidence 
of  beaker  to  beaker  variation  in  lengths.  Thus  the  analysis  in  Figure  VII. 6 
is  carried  out  on  a  per  beaker  basis  rather  than  on  a  per  daphnid  basis. 

Using  the  error  yardstick  calculated  only  from  the  results  in  the  two  control 
groups  (and  thus  having  6  d.f.)  we  see  that  the  average  lengths  in  the  two 
groups  are  not  statistically  significantly  different. 

A  suggested  variability  estimate  for  comparisons  of  average  lengths 
among  groups,  is  discussed  in  Subsection  V.D.  The  estimate  is  based  on 
the  mean  square  between  beakers  within  groups  with  degrees  of  freedom 
based  on  pooling  information  from  the  variability  among  beakers  within 
groups  and  the  variability  among  daphnids  within  beakers.  The  suggested 
variability  estimate  is  MSI  =  0.2707  with  141  d.f.  In  order  for  the 
variability  estimate  to  apply  to  average  lengths  within  beakers  we  must 
divide  it  by  the  sample  size  within  beakers.  The  average  number  of  sur¬ 
viving  adults  per  beaker  in  the  two  groups  is  139/8  =  17.373.  Thus  the 
error  estimate,  normalized  to  a  per  average  basis,  is  0.2707/17.373  =  0.0156. 
(This  compares  with  0.0173  in  Figure  VII. 6).  Using  this  error  yardstick 
we  obtain  the  F-ratio 

F  =  0.0372/0.0156  =  2.391 

We  compare  this  to  the  percentiles  of  an  F-distribution  with  d.f.  1  and 
141.  The  resulting  significance  level  is  a  =  0.124.  This  is  at  most 
marginal.  There  is  thus  a  suggestion  that  the  average  lengths  in  the 
water  control  groups  are  greater  than  those  in  the  solvent  control  group, 
but  nothing  conclusive.  We  will  combine  the  responses  from  both  control 
groups  in  subsequent  analyses. 

It  should  be  noted  that  the  variability  among  beaker  averages  is  signi¬ 
ficantly  greater  (a  =  0.04)  in  the  solvent  control  group  than  in  the  water 
control  group.  Figure  11.29  shows  that  the  greater  variability  in  the 
solvent  control  group  is  due  to  a  single  beaker  average  which  is  somewhat 
removed  from  the  others  in  that  group.  Without  this  relatively  high  value, 
the  average  length  in  the  solvent  control  group  would  be  substantially 
lower  than  that  in  the  water  control  group  and  there  would  probably  be 
a  statistically  and  biologically  significant  differences  between  them. 


D.  CHAPMAN  -  BERYLLIUM 


Ten  daphnids  were  placed  on  test  in  each  group,  one  per  beaker.  Thus 
the  sample  sizes  in  this  data  set  are  considerably  smaller  than  those  in 
the  LeBlanc  data  sets.  Furthermore,  since  the  daphnids  were  individually 
housed,  there  is  no  question  of  beaker  to  beaker  variation  producing  corre¬ 
lated  responses. 

Survival 


The  mortality  responses  in  the  water  control  and  in  the  solvent  control 
groups  are  compared  by  carrying  out  a  2  by  2  contingency  table  test  for  hetero¬ 
geneity.  Two  tailed  tests  are  used.  The  output  is  shown  in  Figure  VII. 7. 

Because  of  the  small  numbers  of  daphnids  per  groups,  the  expected 
frequencies  are  relatively  small.  Under  the  hypothesis  of  homogeneity,  the 
expected  number  of  dead  daphnids  per  group  is  1.0.  This  raises  questions 
about  the  validity  of  the  asymptotic  theory  on  which  the  usual  Pearson  chi 
square  test  is  based.  We  see,  in  fact,  that  the  observed  significance 
levels  of  Fisher's  exact  test  (two  tailed)  and  the  chi  square  test  with 
correction  differ  considerably  from  the  observed  significance  level  of  the 
uncorrected  chi  square  test.  The  conclusion,  based  on  the  chi  square  test 
with  correction  or  based  on  Fisher's  exact  test,  is  that  there  is  no 
statistical  evidence  of  differences  in  mortality  between  the  solvent  and 
water  control  group  (a  =  0.47). 

Reproduction 

Comparisons  between  water  and  solvent  control  groups  are  based  only  on 
those  daphnids  that  survived  to  the  end  of  the  test.  There  were  ten  sur¬ 
vivors  in  the  water  control  group  and  eight  survivors  in  the  solvent  control 
group.  The  daphnids  that  died  early  obviously  present  a  distorted  view  of 
total  number  of  young  produced  and  so  their  responses  are  not  included  in 
this  comparison.  Total  numbers  of  young  per  individual  adult  daphnid  can 
be  determined  for  these  data  since  the  daphnids  were  housed  just  one  to  a 
beaker. 

The  output  from  the  comparison  appears  in  Figure  VII. 8.  The  analysis  in 
Figure  VII. 8  is  carried  out  by  means  of  a  one  way  analysis  of  variance  with 
two  groups,  based  on  an  error  yardstick  calculated  only  from  the  results 
in  the  two  control  groups  (and  thus  having  16  d.f.).  On  the  basis  of  this 
analysis  we  conclude  that  there  is  no  statistical  evidence  of  differences 
in  production  rates  in  the  two  groups.  However  the  average  production  rate 
in  the  solvent  control  group  is  about  18  percent  higher  than  that  in  the 
water  control  group. 

Figure  VI. 12  shows  the  results  of  an  analysis  of  variance  fit  to  all  the 
groups.  On  the  basis  of  this  fit  an  error  mean  square  of  2365.759  with  55 
d.f.  is  estimated.  This  mean  square  is  somewhat  lower  than  that  in  Figure 
VII. 8,  based  solely  on  the  control  groups.  The  plot  in  Figure  11.17  suggests 


that  the  control  group  responses  are  higher  and  more  variable  than  the 
treatment  group  responses.  (However  a  two  tailed  F-test  to  compare  the 
error  mean  square  in  the  treatment  groups  with  that  in  the  control  groups 
is  nonsignificant  (F  =  1.71,  a  =  0.17)).  If  we  recalculate  the  F-ratio 
in  Figure  VII. 8  using  the  error  estimate  in  Figure  VI. 12  we  obtain 

F  =  3240.00/2365.759  =  1.369 

When  compared  to  an  F-distr ibution  with  degrees  of  freedom  1  and  55,  the 
observed  significance  level  is  a  =  0.247.  Thus,  the  conclusions  are  un¬ 
changed  . 

Length 

The  21  day  lengths  of  surviving  adults  in  the  water  control  and  solvent 
control  groups  are  compared  by  carrying  out  a  two  sample,  two  tailed  t-test 
or  equivalently  by  a  one  way  analysis  of  variance  with  two  groups.  Output 
from  the  one  way  analysis  of  variance  appears  in  Figure  VII. 9.  The  analysis 
in  Figure  VII. 9  is  based  on  an  error  yardstick  calculated  only  from  the 
results  in  the  two  control  groups  (and  thus  having  16  d.f.).  On  the  basis 
of  this  analysis  we  conclude  that  there  is  no  statistical  evidence  of 
differences  in  average  lengths  among  survivors  in  the  two  groups.  The 
average  lengths  are  virtually  identical  (4.28  mm  in  the  water  control 
group  and  4.275  mm  in  the  solvent  control  group). 

Figure  VI. 26  shows  the  results  of  an  analysis  of  variance  fit  to  all 
the  groups.  On  the  basis  of  this  fit  an  error  mean  square  of  0.108  with 
55  d.f.  is  estimated.  This  mean  square  is  somewhat  greater  than  that  in 
Figure  VII. 9,  based  solely  on  the  control  groups.  (A  two  tailed  F-test 
to  compare  the  error  mean  square  in  the  treatment  groups  with  that  in  the 
control  groups  is  significant  at  a  =  0.04).  Thus  recalculating  the  F-ratio 
based  on  the  mean  square  in  Figure  VI. 26  would  not  change  the  conclusions 
arrived  at  based  on  the  analysis  in  Figure  VII. 9. 

E.  SUMMARY 

Three  of  the  data  sets  under  consideration  contain  both  water  and  solvent 
control  groups.  These  are  LeBlanc-Tests  A  and  B  and  Chapman-Beryllium  Test. 
Comparisons  of  the  average  survival,  length  and  reproduction  responses  were 
made  between  the  two  control  groups  in  each  of  the  tests.  Statistically 
(and  perhaps  biologically)  significant  differences  were  seen  for  all  three 
responses  in  LeBlanc's  Test  A,  especially  for  reproduction  and  length.  No 
statistically  significant  differences  were  found  for  any  of  the  responses 
either  in  LeBlanc's  Test  B  or  in  Chapman's  Beryllium  Test. 

Based  on  the  outcomes  of  these  comparisons,  in  subsequent  sections  we 
will  base  comparisons  of  treatment  and  control  group  responses  on  pooled 
water  and  solvent  control  group  responses  in  LeBlanc's  Test  B  data  and  in 
Chapman's  Beryllium  data  and  we  will  make  separate  comparisons  with  the 
water  control  and  with  the  solvent  control  group  responses  in  LeBlanc's 
Test  A.  The  results  of  both  sets  of  comparisons  in  LeBlanc  Test  A  will  be 
reported.  However  this  will  not  resolve  the  conceptual  issues  of  inter¬ 
pretation  that  were  discussed  in  Subsection  A. 
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Figure  VII. 3.  Comparison  of  21  day  lengths  of  surviving  adults  between  solvent  and 
water  control  groups.  LeBlanc  test  A. 


Figure  VII. 4.  Comparison  of  mortality  rates  in  solvent  and  water  control  groups. 

LeBlanc  test  B.  (Group  1  -  water  control.  Group  2  -  solvent  control). 


Figure  VII. 5.  Comparison  of  cumulative  offspring  per  surviving  adult  between  solvent 
and  water  control  groups.  LeBlanc  test  B. 
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Comparison  of  21  day  lengths  of  surviving  adults  between  solvent  and 
water  control  groups.  LeBlanc  test  B. 
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Comparison  of  total  number  of  offspring  between  solvent  and  water  control 
groups.  Chapman  -  beryllium.  Comparison  is  confined  to  those  daphnids 
that  survived  to  the  end  of  the  test. 


Comparison  of  21  day  lengths  of  surviving  adults  between  solvent  and 
water  control  groups.  Chapman  -  beryllium.  _ 


VIII.  COMPARISON  OF  LENGTH  AND  REPRODUCTION  RESPONSES 


A.  INTRODUCTION 

A  number  of  toxicologists  have  informally  suggested  that  there  is  an 
association  between  the  lengths  of  adult  daphnids  and  the  numbers  of  off¬ 
spring  that  they  produce.  If  there  is  in  fact  a  strong  positive  correlation 
between  these  variables  then  perhaps  it  would  suffice  just  to  measure  the 
adult  daphnids  rather  than  to  count  their  offspring.  The  latter  task  is 
much  more  tedious  and  time  consuming  than  the  former  and  so  the  possibility 
of  being  able  to  do  away  with  measuring  reproduction  is  attractive  from 
both  the  cost  and  work  standpoint.  However  before  suggesting  this,  we 
would  need  to  determine  whether  any  information  would  be  lost  by  relying 
solely  on  length  measurements. 

Some  individuals  have  even  gone  a  step  further  (with  tongue  firmly 
implanted  in  cheek?)  and  suggested  that  there  is  a  strong  association 
between  7  day  lengths  and  21  day  lengths,  so  that  adult  daphnids  need  only 
be  measured  on  day  7.  Thus  to  carry  matters  to  the  extreme,  life  cycle 
effects  could  be  predicted  by  21  day  lengths  and  reproduction,  which  could 
be  predicted  by  21  day  lengths,  which  in  turn  could  be  predicted  by  7  day 
lengths!  That  would  introduce  wonderful  economics  of  operation! 

We  limit  the  discussion  in  this  section  to  briefly  studying  the  associa¬ 
tion  between  21  day  lengths  and  21  day  cumulative  reproduction  per  survivor. 
Subsection  B  contains  scatterplots  of  the  relations  between  21  day  lengths 
and  reproduction.  Subsection  C  contains  regression  fits  relating  these 
variables,  plots  of  residuals  from  the  fits,  and  inferences  about  the 
predictability  of  cumulative  reproduction  based  on  lengths.  Subsection  D 
contains  a  summary  and  conclusions. 

B.  PRELIMINARY  SCATTERPLOTS 

This  subsection  contains  scatterplots  that  display  the  associations 
between  21  day  lengths  and  21  day  cumulative  reproduction  for  LeBlanc's 
Tests  A  and  B  and  for  Chapman's  Beryllium  Test.  In  both  of  LeBlanc's  data 
sets  the  daphnids  were  multiply  housed,  20  to  a  beaker  at  the  outset  of  the 
test.  Thus  reproduction  was  determined  on  a  per  beaker  basis  and  cannot  be 
associated  with  individual  daphnids.  By  contrast  lengths  were  determined 
on  a  per  daphnid  basis  and  so  can  be  associated  with  individual  daphnids. 
Thus  the  only  comparisons  that  can  be  made  in  LeBlanc's  data  sets  are  on  a 
beaker  basis.  Namely  21  day  cumulative  production  per  surviving  adult  is 
compared  to  21  day  average  length  among  survivors,  for  each  beaker.  Of 
course  the  reproduction  and  length  values  are  not  necessarily  based  on 
exactly  the  same  set  of  daphnids  since  those  that  died  before  the  end  of 
the  test  influence  the  reproduction  values  but  not  the  average  lengths. 

In  Chapman's  data  the  situation  is  better.  Since  the  daphnids  were  housed 
individually,  21  day  cumulative  production  and  21  day  length  determinations 
can  be  associated  on  a  daphnid  by  daphnid  basis. 


Figure  VIII. 1  shows  the  relationship  between  21  day  cumulative  reproduc¬ 
tion  per  surviving  adult  and  average  21  day  lengths  for  LeBlanc's  Test  A. 

Each  point  corresponds  to  a  beaker  and  plotting  symbol  is  the  group  number. 
Note  chat  the  results  from  group  7  have  been  excluded  from  this  plot  because 
of  high  mortality.  Two  or  more  coincident  points  from  different  groups  are 
plotted  as  an  asterisk.  There  is  a  definite  positive  association  between 
these  responses.  The  trend  appears  approximately  linear  for  higher  length 
values  and  flattens  out  for  lower  length  values.  There  is  a  fair  bit  of 
scatter  about  the  general  trend. 

Figure  VIII. 2  shows  the  relationship  between  21  day  cumulative  repro¬ 
duction  per  surviving  adult  and  average  21  day  lengths  for  LeBlanc's  Test  B. 
The  plotting  symbols  have  the  same  meaning  as  in  Figure  VIII. 1.  There  is 
no  association  between  the  two  responses  in  this  test. 

Figure  VIII. 3  shows  the  relationship  between  21  day  cumulative  repro¬ 
duction  and  21  day  lengths  for  daphnids  that  survived  to  the  end  in  Chapman's 
Beryllium  Test.  Each  point  corresponds  to  an  individual  daphnid.  Plotting 
symbol  is  the  group  number.  Two  or  more  coincident  points  from  different 
groups  are  plotted  as  an  asterisk.  There  is  a  rather  strong  positive 
association  between  the  responses,  stronger  than  that  in  Figure  VIII. 1  for 
LeBlanc's  data.  The  trend  appears  approximately  linear  for  higher  length 
values  and  flattens  out  for  lower  length  values.  There  is  much  less  scatter 
about  this  trend  than  about  the  trend  in  Figure  VIII. 1. 

In  order  to  enhance  the  straight  line  trend  in  the  lower  portion  of 
the  plot,  a  logarithmic  transformation  of  cumulative  production  was  made. 
Figure  VIII. 4  shows  the  relationship  between  log^Q  (21  day  cumulative 
production)  and  21  day  lengths  for  daphnids  that  survived  to  the  end  of 
the  test.  There  now  appears  to  be  a  straight  line  trend  throughout  the 
entire  range  of  length  values,  however  the  variability  about  the  trend  line 
is  greater  for  shorter  daphnids  than  for  taller  ones.  The  strength  of  the 
relationship  is  again  very  much  greater  than  that  in  Figure  VIII. 1. 

C.  REGRESSION  ANALYSES 

Based  on  the  appearances  of  the  preliminary  scatterplots  in  Subsection  B 
it  appears  that  there  is  at  least  some  association  between  length  and  cumula¬ 
tive  reproduction.  To  quantify  the  extent  of  this  relationship,  simple 
linear  regression  models  were  fitted  to  predict  production  from  length 
in  LeBlanc's  Tests  A  and  B  and  in  Chapman's  Beryllium  Test  and  to  predict 
loglO  (production)  from  length  in  Chapman's  Beryllium  Test.  The  results  of 
these  regression  fits  are  shown  in  Figures  VIII. 5  to  VIII. 8. 

The  results  in  Figure  VIII. 5  show  that  there  is  a  significant  associa¬ 
tion  between  length  and  reproduction  but  lengths  explain  just  29  percent 
of  the  variation  in  production.  Thus  better  than  two  thirds  of  the  varia¬ 
tion  in  production  is  explained  by  factors  other  than  length.  This  and 
the  appearance  of  Figure  VIII. 1  strongly  suggest  that  21  day  average  length 
per  beaker  is  not  in  and  of  itself  a  good  predictor  of  21  day  cumulative 
reproduction  in  Test  A,  although  the  two  responses  are  related. 


The  results  in  Figure  VIII. 6  and  the  scatterplot  in  Figure  VIII. 2 
show  no  evidence  of  any  association  between  21  day  average  length  per 
beaker  and  21  day  cumulative  mortality  in  Test  B. 

The  results  in  Figures  VIII. 7  and  VIII. 8  and  the  scatterplots  in 
Figures  VIII. 3  and  VIII. 4  show  moderately  strong  associations  between  21 
day  length  and  21  day  cumulative  production  per  daphnid  in  Chapman's 
Beryllium  Test.  The  regression  in  Figure  VIII. 8  explains  nearly  60  percent 
of  the  variability  in  log^Q  (production).  This  is  a  much  stronger  associa¬ 
tion  than  was  the  case  in  LeBlanc's  tests.  It  is  interesting  to  note  that 
the  parameters  of  the  regression  model  fitted  to  the  Test  A  data  in 
Figure  VIII. 5  are  quite  similar  to  those  fitted  to  the  Beryllium  data  in 
Figure  VIII. 7. 

In  summary,  the  degree  of  association  is  strongest  in  Chapman's  test 
and  weakest  in  LeBlanc's  Test  B.  The  degree  of  association  differs  markedly 
from  test  to  test. 

D.  RESIDUAL  DISPLAYS 

Residuals  are  the  differences  between  the  observed  values  of  the  responses 
and  the  values  predicted  by  the  regression  model.  The  residuals  reveal 
systematic  structure  in  the  data  that  was  not  accounted  for  by  the  regres¬ 
sion  fit,  departures  from  model  assumptions  such  as  lack  of  independence  or 
nonconstant  variance,  outliers  in  the  data,  associations  with  variables 
not  included  in  the  fit,  etc.  Residuals  can  be  studied  by  preparing 
various  typ  is  of  graphical  and  numerical  displays  that  look  for  particular 
types  of  structure.  If  the  fitted  regression  model  accounts  for  all  the 
systematic  behavior  in  the  responses,  the  residuals  should  resemble  random 
noise.  Any  systematic  behavior  observed  in  the  residuals  suggests  that  the 
fitted  regression  model  is  not  fully  adequate  to  describe  all  the  structure  in 
the  data. 

Plots  of  residuals  versus  predicted  values,  squares  of  residuals  versus 
predicted  values,  normal  probability  plots  of  residuals,  and  scatterplots 
of  residuals  versus  group  number  were  prepared.  Most  of  the  plots  did  not 
reveal  any  anomalies  or  departures  from  assumptions.  However  the  plots  of 
residuals  by  group  revealed  some  systematic  structure  in  the  data  over  and 
above  that  explained  by  the  regression  model.  These  plots  are  shown  in 
Figures  VIII. 9  to  VIII. 11.  If  the  lengths  accounted  for  all  the  systematic 
behavior  in  production  then  the  residuals  in  these  plots  should  be  dis¬ 
tributed  randomly  about  0  with  no  trends  across  groups.  This  is  seen 
not  to  be  the  case. 

Figure  VIII. 9  displays  the  residuals  from  the  regression  fit  in  LeBlanc's 
Test  A  plotted  by  group.  Recall  that  the  group  7  results  were  deleted 
because  of  the  very  high  mortality  rate.  A  systematic  pattern  can  be  seen. 
Namely  there  is  a  generally  upward  trend  in  the  residuals  as  group  number 
increases  from  1  to  5  and  then  a  sharp  drop  in  the  residuals  in  group  6. 

This  suggests  that  average  length  decreases  more  rapidly  than  production 
in  groups  1-5  and  production  decreases  much  more  rapidly  than  length  in 


going  from  group  5  to  group  6.  Since  group  number  is  really  a  surrogate 
for  concentration  level,  this  behavior  suggests  that  toxicant  concentration 
impacts  cumulative  production  in  a  manner  that  cannot  be  fully  explained 
by  its  impact  on  length.  That  is,  concentration  is  associated  with 
production  over  and  above  the  association  of  length  with  production. 

Figure  VIII. 10  displays  the  residuals  from  the  regression  fit  in 
LeBlanc's  Test  B  plotted  by  group.  A  systematic  pattern  is  again  evident. 

There  is  a  curvilinear  trend  in  the  average  residual  within  groups,  first  rising 
and  then  falling.  This  suggests  that  for  lower  concentrations  length  de¬ 
creases  more  rapidly  than  productivity  while  for  higher  concentrations 
productivity  decreases  mor  rapidly  than  length.  This  again  shows  that 
productivity  is  associated  with  concentration  over  and  above  its  associa¬ 
tion  with  length. 

Figure  VIII. 11  displays  the  residuals  from  the  regression  fit  in 
Chapman's  test  on  beryllium  plotted  by  group.  A  systematic  trend  is  again 
evident.  There  is  a  curvilinear  trend  in  the  average  residual  within  groups, 
first  falling  and  then  rising.  This  trend  may  be  due  to  lack  of  fit  of  the 
simple  linear  regression  model  or  to  effects  of  concentration  on  production 
that  are  not  reflected  in  effects  on  length.  To  assess  whether  there  is 
any  systematic  lack  of  fit  in  the  simple  linear  regression  relating  length 
and  production,  the  residuals  from  this  fit  were  plotted  versus  predicted 
values  (which  are  essentially  proportional  to  length).  The  plot  is  dis¬ 
played  in  Figure  VIII. 12. 

To  assess  whether  there  is  any  systematic  trend  in  the  residuals,  the 
range  of  predicted  values  was  subdivided  so  that  there  would  generally  be 
10-20  points  within  each  interval.  The  median  of  the  residuals  was  cal¬ 
culated  in  each  interval  and  is  indicated  by  the  symbol  M.  The  trend  in 
these  medians  is  characteristic  of  trends  that  reflect  departures  from 
polynomial  models.  It  is  interesting  to  note  that  the  trend  in  Figure  VIII. 11 
mimics  that  in  Figure  VIII. 12,  although  the  extent  of  the  dip  in  the  group 
means  in  Figure  VIII. 11  is  deeper  than  the  dip  in  the  medians  in  Figure 
VIII. 12.  Thus  at  least  part  of  the  trend  observed  in  Figure  VIII. 11  might 
be  eliminated  by  adding  quadratic  or  cubic  terms  to  the  regression  in 
Figure  VIII. 8  relating  log^Q  (reproduction)  and  length.  This  should  be 
done  to  determine  if  it  improves  the  fit  and  the  residuals  should  be 
recalculated.  Since  this  path  was  not  pursued,  we  cannot  comment  about 
the  extent  to  which  the  trend  in  Figure  VIII. 11  reflects  quadratic  or 
cubic  effects  in  the  regression  relationship  between  length  and  log^g 
(reproduction)  or  the  extent  to  which  it  reflects  effects  of  concentration 
on  production  not  associated  with  effects  on  length.  Sorting  this  out  will 
await  future  work.  However  the  nearly  random  appearance  of  Figure  VIII. 12 
suggests  that  adding  nonlinear  terms  to  the  regression  model  will  not 
markedly  improve  the  strength  of  the  fit.  Thus  most  of  the  systematic 
behavior  in  Figure  VIII. 11  is  probably  due  to  effects  of  factors  other 
than  length. 


E.  INFERENCES  BASED  ON  THE  REGRESSION  MODEL 

The  adequacy  or  inadequacy  of  a  regression  fit  depends  on  whether 
inferences  of  interest  can  be  made  with  sufficient  precision  to  be  of 
practical  use.  The  natural  type  of  inference  to  be  made  from  a  regression  model 
relating  length  and  production  is  the  prediction  of  expected  production 
given  length.  This  might  be  in  the  form  of  a  confidence  interval  on  mean 
production  conditional  on  length  or  a  prediction  interval  on  the  sample 
average  of  say  10  daphnids,  conditional  on  length.  We  consider  the  cal¬ 
culation  of  95  percent  confidence  intervals  for  various  values  of  length. 

We  illustrate  the  procedure  with  Chapman's  beryllium  data.  Let  LPROD 
denote  log^  (PROD).  From  Figure  VIII. 8  we  determine  that  the  fitted 
regression  model  is 

LPROD  =  S0  +  61  LENGTH  e  0.15515  +  0.46306  LENGTH 

Now  the  average  length  is  3.96508  mm.  Thus  the  above  equation  can  be 
rewritten  as 


LPROD  =  LPROD  +  (Sj  (LENGTH  -  LENGTH)  =  1.9912  +  0.46306  (LENGTH  -  LENGTH)  . 
Thus  the  standard  error  of  LPROD  is 

[o2/n  +  o2  (SL)  (LENGTH  -  LENGTH) 2] 1/2  E  o  (LPROD)  = 

[0.0241/63  +  0.0024  (LENGTH  -  3.9651)2]1/2 

A  95  percent  confidence  interval  on  average  LPROD  is 

LPROD  +  t(0. 975;  61)  o  (LPROD)  =  LPROD  +  2.000  o  (LPROD) 

We  calculate  predicted  mean  productions  and  95  percent  confidence  intervals 
for  various  values  of  length  by  exponentiating. 


LENGTH 

PROD 

LWR  95  PCT  CONF  BND 

UPR  95  PCT  CONF  BND 

3.00 

35.02 

27.67 

44.33 

3.50 

59.68 

51.98 

O' 

00 

Ln 

4.00 

101.72 

92.92 

111.34 

4.50 

173.35 

149.12 

201.52 

4.75 

226.30 

185.53 

276.03 

The  upper  confidence  bounds  are  about  50  to  60  percent  greater  than 
the  lower  confidence  bounds  at  the  extremes  of  the  range  (i.e.  3.00  and 
A. 75  mm)  and  about  20  percent  greater  in  the  middle  of  the  range  (i.e. 

4.00  mm).  Such  precision  (or  lack  of  precision)  may  be  adequate  for 
assessing  general  trends  in  production  with  increasing  length,  but  is 
probably  not  adequate  for  using  lengths  as  a  surrogate  response  for  produc¬ 
tion. 

F.  SUMMARY  AND  CONCLUSIONS 

We  have  studied  the  relationship  between  oroduction  and  length  in  three 
data  sets.  We  saw  that  the  extent  of  association  varied  considerably  in 
the  different  tests.  This  suggests  that  no  generalization  can  be  made 
about  the  association  between  these  variables  across  tests.  In  some  tests 
they  will  be  more  strongly  associated  than  in  others. 

Both  prediction  and  length  responses  pertained  to  beakers  in  the  LeBlanc 
data  sets  but  pertained  to  individual  daphnids  in  the  Chapman  data  set. 

The  degree  of  association  between  production  and  length  was  much  stronger 
in  the  Chapman  data  than  in  the  LeBlanc  data.  This  suggests  that  responses 
should  be  collected  on  a  per  daphnid  basis  rather  than  on  a  per  beaker  basis 
if  the  two  variables  are  to  be  associated  by  regression  models. 

Just  21  day  production  and  21  day  lengths  were  used  in  the  regression 
models  fitted  in  this  section.  Perhaps  better  association  could  be  attained 
if  increments  of  production  were  related  to  intermediate  values  of  length 
and  changes  in  these  values.  Such  an  effort,  involving  perhaps  7,  14  and 
21  day  lengths  and  production,  would  be  somewhat  more  complex  than  working 
just  with  21  day  responses. 


Figure  VIII. 2.  21  day  cumulative  reproduction  per  surviving  adult  per  beaker 

versus  21  day  average  length  per  beaker  -  LeBlanc  Test  B. 


Figure  VIII. 3.  21  day  cumulative  reproduction  versus  21  day  length  for 

surviving  adults  -  Chapman  Beryllium. 
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Figure  VIII. 4.  21  day  log^o  (cumulative)  reproduction  versus  21  day  length 

surviving  adults  -  Chapman  Beryllium 


Figure  VIII. 5.  Simple  linear  regression  of  21  day  cumulative  reproduction  per 
surviving  adult  versus  average  21  day  length  -  LeBlanc  Test  A. 
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Linear  regression  of  21  day  cumulative  reproduction 
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Kigure  V[II. 


Figure  VIII. 8.  Simple  linear  regression  of  logio  (21  day  cumulative  reproduction) 
versus  21  day  length  for  surviving  adults  -  Chapman  Beryllium. 


Residuals  from  simple  linear  regression  of  reproduction  on  length  plotted 
versus  group  number  —  LeBlanc  Test  A  (Group  7  responses  deleted)  (M 
corresponds  to  group  average  of  the  residuals). 
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Figure  VIII. 10.  Residuals  from  simple  linear  regression  of  reproduction  on  length  plotted 
versus  group  number  -  LeBlanc  Test  B  (M  corresponds  to  group  average 
of  the  residuals). 


[duals  from  simple  linear  regression  of  log^g  (reproduction)  on  length 
tted  versus  predicted  -  Chapman  Beryllium. 


IX.  TESTING  FOR  CONCENTRATION  RELATED  EFFECTS  ON  MORTALITY 


A.  INTRODUCTION 

After  we  have  carried  out  preliminary  graphical  displays,  tests  for  beaker 
to  beaker  heterogeneity  within  groups,  outlier  detection  procedures,  and 
adjustments  to  account  for  beaker  to  beaker  heterogeneity  we  are  ready  to 
proceed  to  the  main  portion  of  the  data  analysis.  This  involves  comparing 
responses  across  treatment  groups  to  arrive  at  interferences  about  what 
constitutes  an  "acceptable"  concentration.  In  this  and  subsequent  sections 
we  discuss  a  number  of  hypothesis  testing  and  estimation  procedures  to  com¬ 
pare  the  responses  obtained  in  the  treatment  groups  with  those  in  the  control 
group(s).  Comparisons  are  made  for  mortality,  length  and  reproduction 
responses.  We  first  discuss  comparisons  of  mortality  rates  across  groups. 

This  section  discusses  a  number  of  hypothesis  testing  procedures  to 
make  inferences  about  which  treatment  groups  are  statistically  significantly 
different  from  the  control  group(s).  The  chi  square  test  for  homogeneity, 
the  measure  of  association  tests  based  on  various  Goodman  and  Kruskal  measures, 
the  Cochran  Armitage  test  and  extensions,  and  Williams'  test  are  discussed 
and  illustrated.  The  chi  square  test  is  an  overall,  shotgun  type  test  while 
the  other  tests  are  one  sided  tests,  tailor  made  to  be  sensitive  to  monotone 
alternatives.  They  would  thus  be  expected  to  be  more  sensitive  than  the  chi 
square  test  to  the  types  of  alternative  hypothesis  to  be  expected  in  aquatic 
toxicity  data. 

The  following  sections  discuss  dose  response  curve  estimation  based  on 
probit  models,  inferences  based  on  these  fits,  and  confidence  interval 
inferences  to  compare  treatment  group  and  control  group  responses,  based 
either  on  unadjusted  response  rates  or  on  response  rates  smoothed  across 
groups  by  fitting  dose  response  curves. 

As  discussed  in  Feder  and  Collins  [1],  inferences  based  on  hypothesis 
testing  procedures  have  several  severe  drawbacks.  First  of  all  they  are 
based  on  the  notion  of  "statistical  significance"  and  do  not  account  at  all 
for  "biological  significance".  The  notion  of  statistical  significance  is 
dependent  on  the  sample  size  and  on  the  variability  of  the  responses  as 
well  as  on  the  magnitudes  of  the  effects  observed  in  the  test.  Thus  effects 
of  considerable  biological  importance  could  be  declared  not  statistically 
significant  if  the  sample  sizes  are  too  small  or  the  variability  of  responses 
is  too  great.  Conversely,  biologically  trivial  effects  could  be  strongly 
statistically  significant  if  the  sample  sizes  are  very  large. 

Inferences  based  on  dose  response  curves  can  take  biological  significance 
into  account  by  formulating  the  statistical  problem  as  determining  that 
toxicant  concentration  that  results  in  a  10  percent  or  a  25  percent, 
etc.  increase  in  mortality  over  the  control  group  rate  or  a  decrease  of 
0.5  mm  or  1  mm,  etc  in  average  length  as  compared  to  the  control  groups,  or 
a  decrease  in  production  of  25  or  50,  etc  offspring  per  adult  as  compared 
with  the  control  group. 


Furthermore  the  effects  of  small  sample  sizes  or  variable  responses 
on  estimates  of  "safe  concentrations"  are  opposite  depending  on  whether 
tests  of  hypothesis  or  dose  response  curves  are  used  as  the  basis  of 
making  inferences.  Namely  if  the  "safe  concentration"  is  defined  as  the 
highest  concentration  whose  response  rate  is  not  statistically  significantly 
different  than  the  control  group  rate,  then  small  sample  sizes  or  variable 
responses  will  result  in  not  rejecting  H  for  moderate  differences  between 
treatment  and  control  group  responses.  ¥his,  in  turn,  will  produce  an 
increase  in  the  reported  "safe  concentration".  The  smaller  and  less  pre¬ 
cise  the  toxicity  test  is,  the  greater  will  be  the  reported  "safe  concentra¬ 
tion".  By  contrast,  if  the  "safe  concentration"  is  defined  as  the  lower 
confidence  bound  on  that  concentration  which  produces  a  given  increase, 
for  example  10  percent,  in  mortality  above  the  control  group  rate  or  a 
given  decrease  in  length  or  production  as  compared  with  the  control  group, 
then  small  sample  sizes  or  variable  responses  will  result  in  longer  con¬ 
fidence  intervals  and  therefore  reduced  lower  bounds.  This,  in  turn,  will 
produce  a  decrease  in  the  reported  "safe  concentration". 

Thus  inferences  based  on  the  percentiles  of  dose  response  curves  yield 
more  conservative  estimates  from  toxicity  tests  with  limited  information 
than  from  toxicity  tests  with  ample  information.  Inferences  based  on  tests 
of  hypotheses  yield  less  conservation  estimates  from  toxicity  tests  with 
limited  information  than  from  toxicity  tests  with  ample  information. 

Opinion:  Inferences  based  on  estimated  percentiles  of  dose  response 

curves  are  more  appropriate  than  those  based  on  hypothesis  tests  because 
they  explicitly  incorporate  the  notion  of  biological  significance  into 
the  reported  value  and  because  they  yield  more  conservative  estimates 
from  toxicity  tests  that  provide  limited  information. 

B.  ADJUSTMENT  OF  DATA 

Section  V  contains  adjustment  procedures  to  account  for  the  presence 
of  beaker  to  beaker  heterogeneity  in  mortality  and  length  responses. 
Mortality  responses  were  adjusted  by  reducing  the  actual  numbers  of 
responses  and  daphnids  per  beaker  to  effective  responses  and  sample 
sizes.  Effective  degrees  of  freedom  were  also  calculated.  Inference 
procedures  on  lengths  were  adjusted  for  beaker  to  beaker  heterogeneity 
by  carrying  out  inferences  on  a  per  beaker  basis  but  augmenting  the 
degrees  of  freedom  associated  with  error  estimates  by  utilizing  informa¬ 
tion  about  the  extent  of  variation  in  responses  among  beakers  within 
groups  in  relation  to  the  variation  among  daphnids  within  beakers. 

In  this  section  we  illustrate  the  use  of  various  hypothesis  testing 
procedures  on  the  mortality  data  from  LeBlanc's  Tests  A  and  B  and  Goulden's 
Isophorone  Test.  The  adjusted  sample  sizes,  numbers  of  responses,  and 
degrees  of  freedom  for  LeBlanc's  Tests  A  and  B  are  displayed  in  Tables  V.3 
and  V.4,  respectively.  Goulden's  isophorone  mortality  data  show  no  evidence 
of  heterogeneity  of  responses  within  groups  among  the  beakers  with  multiple 
daphnids.  Thus  no  adjustments  are  needed. 
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C.  SEQUENTIAL  TESTS 


The  classical  approach  to  carrying  out  overall  tests  of  hypothesis  is 
to  include  all  groups  in  the  test  statistic.  If  the  test  fails  to  reject 
then  conclude  that  there  is  no  statistical  evidence  of  concentration  related 
increases  in  mortality.  If  the  test  rejects  then  follow  up  with  a  multiple 
comparisons  procedure  such  as  Dunnett's  or  Williams’  procedures  to  determine 
which  treatment  groups  differ  significantly  from  the  control  group  and  thereby 
arrive  at  an  MATC.  This  two  part  procedure  can  be  combined  into  a  single 
procedure  utilizing  a  sequential  testing  approach.  This  approach  can  be 
applied  for  any  test  procedure.  The  test  is  first  carried  out  using  all 
the  treatment  groups.  If  the  test  rejects  the  null  hypothesis,  the  highest 
treatment  group  is  deleted  and  the  test  is  repeated  on  the  remaining  groups. 

No  simultaneity  adjustments  are  made.  Each  time  the  test  rejects,  another 
group  is  deleted.  This  process  is  continued  until  the  test  no  longer  rejects. 

The  highest  remaining  treatment  group  is  declared  to  be  the  MATC.  If  the 
mortality  rate  is  a  monotone  increasing  function  of  concentration,  then  any 
group  for  which  the  mortality  rate  is  the  same  as  the  control  rate  has  prob¬ 
ability  at  most  a  (where  a  is  the  type  1  error  level)  of  being  declared  signi¬ 
ficantly  different  from  the  control  group.  This  holds  simultaneously  for 
all  such  groups.  This  is  because  any  group  with  a  mortality  rate  the  same 
as  the  control  rate  can  be  declared  significantly  different  from  the  control 
group  only  if  the  highest  concentration  group  among  those  not  different  from 
the  control  group  is;  this  has  probability  at  most  a.  The  idea  of  carrying 
out  tests  in  a  sequential  manner  was  communicated  to  me  by  Dr.  David  Schoenfield. 


D.  CHI  SQUARE  TEST  OF  HOMOGENEITY  ACROSS  TREATMENT  GROUPS 

The  most  commonly  used,  although  not  the  most  appropriate,  overall  test 
for  differences  in  mortality  rates  across  treatment  and  control  groups  is 
the  chi  square  test  for  homogeneity.  It  is  analagous  to  the  "shotgun" 
analysis  of  variance  F  test  for  quantitative  responses.  It  is  not  entirely 
appropriate  for  testing  homogeneity  of  mortality  responses  in  aquatic 
toxicity  tests  because  the  treatment  groups  have  a  natural  ordering  (i.e. 
concentration  level),  the  anticipated  alternative  may  be  of  a  particular 
type  (e.g.  increased  mortality),  and  the  magnitude  of  response  may  be 
monotone  increasing  or  decreasing  with  increasing  group  number  (e.g. 
increasing  mortality  or  decreasing  average  production  with  increasing 
concentration) .  The  chi  square  test  is  not  designed  to  take  any  of  this 
structure  into  account.  It  is  thus  relatively  inefficient  compared  to 
other  test  procedures  that  are  designed  to  be  sensitive  to  such  one  sided, 
monotone  alternatives.  Several  such  alternative  tests  are  discussed  in 
later  subsections. 

The  form  of  the  chi  square  test  is  well  known  and  is  discussed  in  a 
number  of  books,  papers,  and  reports.  Feder  and  Collins  [1],  Subsection 
XI. B  present  expressions  for  the  test  statistic.  Standard  textbooks  such 
as  Dixon  and  Massey  [8],  pp.  240-243  or  Freund  [9],  pp.  287-290  discuss 
the  chi  square  test  in  detail  and  illustrate  its  application.  The  test 
is  implemented  as  a  standard  feature  in  most  statistical  computing  systems. 
For  example,  the  procedure  PROC  FREQ  in  the  SAS  system  (Barr  et  al  [10])  or 
the  program  BMDP1F  in  the  BMDP  system  (Dixon  and  Brown,  [7]),  can  be  used 


to  carry  out  this  test.  The  EXAX2  program  (Feder  and  Willavize,  [2])  will 
also  carry  out  the  chi  square  test,  using  asymptotic  or  exact,  small  sample 
theory  depending  on  the  magnitudes  of  the  expected  cell  frequencies. 

We  present  below  applications  of  the  chi  square  test  of  homogeneity 
to  the  mortality  responses  in  LeBlanc's  Tests  A  and  B  and  in  Goulden's 
Isophorone  Test.  In  LeBlanc's  Test  A  we  carry  out  separate  comparisons, 
using  the  water  control  and  using  the  solvent  control  groups  as  standards. 

We  carry  out  separate  comparisons  against  the  two  control  groups  rather 
than  combining  them  because  we  demonstrated  in  Subsection  VII. B  significant 
differences  in  their  mortality  rates. 

Dl.  LeBlanc  Test  A  -  Comparison  With  Water  Control  Group 

We  adjust  for  the  effects  of  beaker  to  beaker  heterogeneity  within 
groups  by  reducing  the  actual  sample  sizes  and  numbers  of  responses  to  effective 
sample  sizes  and  numbers  of  responses,  as  shown  in  Table  V.3.  Group  6  is 
discounted  more  than  the  others  due  to  the  widely  disparate  mortality  rates 
among  the  four  beakers.  We  pool  the  adjusted  sample  sizes  across  beakers 
within  groups  and  carry  out  the  usual  chi  square  test  as  if  there  was  no 
beaker  to  beaker  heterogeneity.  The  estimated  degrees  of  freedom  for  each 
group  is  given  in  Table  V.3  and  so  the  pooled  degrees  of  freedom  is  30  x  5 
+  3  =  153.  We  should  compare  the  "chi  square"  statistic  to  the 
upper  95  percent  point  of  five  times  an  F-distribution  with  degrees  of 
freedom  5  and  153  rather  than  to  a  chi  square  distribution  with  5  degrees 
of  freedom.  Since  the  difference  between  the  percentiles  corresponding 
to  153  d.f.  and  infinite  d.f.  is  so  minute,  we  ignore  this  adjustment. 

Program  BMDP1F  would  not  accept  the  nonintegral  "sample  sizes"  and 
numbers  of  "responses"  that  result  from  the  adjustment  process.  It  trun¬ 
cates  all  frequencies  down  to  the  next  lowest  integers.  This  program  could 
therefore  not  be  used  to  compare  adjusted  frequencies  across  groups. 

It  should  be  noted  that  there  Is  no  theoretical  reason  for  the  program 
to  carry  out  such  truncations. 

The  EXAX2  program  does  allow  the  use  of  noninteger  "frequencies"  and  so 
we  used  this  program  for  the  examples  below.  Figure  IX. 1  contains  the 
results  of  the  chi  square  test  of  homogeneity  across  groups  for  the  LeBlanc 
Test  A  21  day  mortality  responses.  The  test  is  based  on  the  adjusted 
responses  and  sample  sizes,  as  discussed  previously.  The  water  control 
group  is  used  for  comparison  purposes.  Although  the  expected  adjusted 
frequencies  in  group  G  are  less  than  5.0,  we  base  the  test  on  asymptotic 
chi  square  theory  since  most  of  the  cell  frequencies  are  rather  large 
and  those  in  group  6  are  moderate. 

The  observed  chi  square  value  of  177.63  (with  5  d.f.)  is  very  highly 
significant.  It  is  quite  clear  that  the  response  rate  in  group  7  differs 
from  those  in  the  other  groups.  Perhaps  the  response  rate  in  group  6  does 
also. 

We  delete  group  7  from  the  data  and  recalculate  the  test  statistic. 

Figure  IX. 2  contains  the  results  of  this  test.  The  observed  chi  square 
value  is  reduced  to  9.47.  The  test,  based  on  asymptotic  theory,  is 


marginally  significant  (a  =  0.05  or  a  =  0.06  depending  on  whether  the  chi 
square  distribution  with  4  d.f.  or  the  F-distribution  with  4  and  123  d.f. 
is  used  for  comparison).  There  is  thus  borderline  statistical  evidence 
of  differences  in  response  rates  among  the  groups.  Based  on  the  appearances 
of  Figures  IX. 2  and  II. 1,  it  is  clear  that  the  response  rate  in  group  6 
differs  from  those  in  the  other  groups. 

The  procedure  could  be  continued  by  deleting  group  6  and  continuing. 
However,  based  on  the  appearance  of  Figure  II. 1  and  the  significance  level 
in  Figure  IX. 2,  it  is  clear  that  the  resulting  chi  square  test  would  be 
nonsignificant.  The  process  was  thus  stopped  at  this  point  and  group  5 
was  declared  to  be  the  MATC  group. 

D2.  LeBlanc  Test  A  -  Comparison  With  Solvent  Control  Group 

The  data  and  the  adjustments  are  the  same  as  those  discussed  in  para¬ 
graph  D1  except  that  the  solvent  control  group  (group  2)  is  used  for  com¬ 
parisons  in  place  of  the  water  control  group  (group  1).  The  tests  are  again 
carried  out  with  the  EXAX2  program.  The  results  are  similar  to  those 
obtained  by  using  the  water  control  group. 

Figure  IX. 3  contains  the  results  of  the  chi  square  test  of  homogeneity 
across  groups  for  the  LeBlanc  Test  A  21  day  mortality  responses,  using  the 
solvent  control  group.  The  test  is  based  on  asymptotic  theory.  The  ob¬ 
served  chi  square  value  of  199.14  (with  5  d.f.)  is  very  highly  significant. 

The  response  rate  in  group  7  differs  from  those  in  the  other  groups. 

We  recalculate  the  test  statistic  after  deleting  group  7  from  the  data. 
Figure  IX. 4  contains  the  results  of  this  test.  The  observed  chi  square 
is  reduced  to  14.14.  This  is  still  very  highly  significant  (a  =  0.007). 

Thus  there  is  still  strong  statistical  evidence  of  differences  in  response 

rates  among  the  groups.  Based  on  Figures  IX. 4  and  II. 1,  it  appears  as  if 
the  response  rate  in  group  6  differs  from  those  in  the  other  groups. 

We  recalculate  the  test  statistic  once  more  after  deleting  both  groups 
6  and  7  from  the  data.  Figure  IX. 5  contains  the  result  of  this  test.  The 
observed  chi  square  is  now  2.99  with  3  d.f.  which  is  not  statistically 
significant  (a  =  0.39).  The  process  is  thus  stopped  at  this  point  and 
group  5  is  again  declared  to  be  the  MATC  group. 


D3.  LeBlanc  Test  B  -  Comparison  With  Combined  Water  and  Solvent 
Control  Groups 

The  adjusted  sample  sizes  and  responses  are  shown  in  Table  V.4.  The 
adjustment  is  very  similar  to  that  for  the  Test  A  data.  Group  7  is  dis¬ 
counted  more  than  the  others  due  to  the  widely  disparate  mortality  rates 
among  the  four  beakers.  Since  there  was  no  statistically  significant 
differences  between  the  mortality  rates  in  the  water  and  solvent  control 
groups,  these  two  groups  were  combined  into  a  common  control  group  for  the 
purpose  of  comparison  with  the  treatment  group  responses.  Other  than  that, 
the  tests  were  carried  out  in  the  same  manner  as  those  for  LeBlanc's  Test  A 
data.  In  particular  the  tests  were  carried  out  using  EXAX2  and  asymptotic 
analyses. 


Figure  IX. 6  contains  the  results  of  the  chi  square  test  of  homogeneity 
utilizing  all  the  treatment  groups  and  the  combined  control  groups.  The 
observed  chi  square  value  of  27.13  (with  5  d.f.)  is  very  highly  significant. 
The  response  rate  of  group  7  differs  from  those  in  the  other  groups. 

We  delete  group  7  from  the  data  and  recalculate  the  test  statistic. 
Figure  IX. 7  contains  the  results  of  this  test.  The  observed  chi  square 
value  is  now  5.82  with  4  d.f.  which  is  not  statistically  significant 
(a  =  0.21).  The  process  is  stopped  at  this  point  and  group  6  is  declared 
to  be  the  MATC  group. 

D4.  Goulden  Isophorone  Test 

Attention  is  confined  to  the  three  beakers  per  group  containing  multiple 
daphnids  (5  daphnids  per  beaker).  It  was  shown  in  Subsection  III.B  that 
there  is  no  statistical  evidence  of  heterogeneity  among  beakers  within 
groups  for  this  test.  Thus  we  do  not  adjust  the  sample  sizes  and  numbers 
of  responses  prior  to  pooling  data  across  beakers  within  groups  and  carrying 
out  comparisons  of  response  rates  across  groups.  As  there  is  just  one 
control  group  (group  1),  there  is  no  issue  about  the  comparability  of 
results  in  water  and  solvent  control  groups.  Other  than  those  considera¬ 
tions,  the  tests  were  carried  out  in  the  same  manner  as  those  for  LeBlanc's 
data  from  Tests  A  and  B.  In  particular,  the  tests  were  carried  out  using 
EXAX2  and  asymptotic  analyses. 

Figure  IX. 8  contains  the  results  of  the  chi  square  test  of  homogeneity 
utilizing  all  the  treatment  groups  and  the  control  group.  The  observed 
chi  square  value  of  36.50  (with  5  d.f.)  is  very  highly  significant.  The 
mortality  rate  in  group  6  is  considerably  higher  than  those  in  the  other 
groups. 

We  delete  group  6  from  the  data  and  recalculate  the  test  statistic. 
Figure  IX. 9  contains  the  results  of  this  test.  The  observed  chi  square 
has  been  reduced  to  14.17  (with  4  d.f.),  which  is  still  highly  significant 
(a  =  0.007).  The  mortality  rate  in  group  5  appears  to  differ  from  those  in 
the  other  groups. 

We  delete  group  5  from  the  data  and  recalculate  the  test  statistic. 
Figure  IX. 10  contains  the  results  of  this  test.  The  observed  chi  square 
is  now  1.78  with  3  d.f.,  which  is  not  statistically  significant  (a  =  0.62). 
The  process  is  stopped  at  this  point  and  group  4  is  declared  to  be  the 
MATC  group. 

E.  ONE  SIDED,  MEASURE  OF  ASSOCIATION  TESTS  FOR  ORDERED  CONTINGENCY  TABLES 

The  shotgun  chi  square  test,  although  the  most  commonly  used  test  of 
homogeneity  of  response  rates,  is  not  the  most  appropriate  test  for  applica¬ 
tion  to  aquatic  toxicity  data.  The  reasons  for  this  were  discussed  at  the 
beginning  of  Subsection  D,  above.  Tests  of  hypothesis  that  are  designed 
to  detect  one  sided,  monotone  alternatives  are  more  sensitive  to  and  thus 
more  appropriate  for  the  kinds  of  alternatives  relevant  in  aquatic  toxicity 
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tests.  One  approach  to  the  construction  of  one  sided  tests  is  by  means  of 

measures  of  association  for  ordered  contingency  tables.  Goodman  and  Kruskal 

[11,12]  have  derived  and  reported  on  a  number  of  measures.  Feder  and  Collins 

[1],  Subsection  XI. C,  discuss  a  number  of  these  measures  and  inferences 

based  on  them  in  some  detail.  Measures  discussed  there  include  Goodman 

and  Kruskal' s  gamma,  Kent all's  t,  ,  Stuart's  r  ,  and  Somer's  d. 

be 

These  measures  can  be  thought  of  as  ordered  contingency  table  analogs 
of  correlation  coefficients  for  quantitive  responses.  However  for  a  given 
table  each  of  these  measures  take  on  different  values  and  so  it  is  difficult 
to  ascribe  physical  meaning  to  any  of  the  values.  Thus  we  do  not  recommend 
using  the  values  of  these  measures  as  indicators  of  the  strength  of  a  toxi¬ 
cant.  However  for  each  of  the  measures  a  value  of  zero  means  no  monotone 
association  between  group  number  and  mortality  rate.  Positive  or  negative 
values  of  the  measures  mean  positive  or  negative  associations,  respectively. 
It  should  be  noted  that,  just  as  with  correlation  coefficients,  a  measure  of 
monotone  association  can  be  zero  in  the  presence  of  a  strong  but  nonmonotone 
association.  Thus  these  measures  of  association  can  be  used  to  test  null 
hypotheses  of  homogeneity  of  mortality  rates  across  groups  against  alterna¬ 
tives  of  one  sided,  monotone  trends.  Procter  [13]  has  shown  that  such 
tests  are  much  more  powerful  against  one  sided,  monotone  alternatives  than 
is  the  shotgun,  chi  square  test. 

In  order  to  use  estimates  of  measures  of  association  for  statistical 
inferences,  it  is  necessary  to  know  something  about  their  distribution,  in 
particular  their  variability  around  the  population  value.  Goodman  and 
Kruskal  [12]  derive  asymptotic  (normal)  distributions  of  these  estimates 
by  means  of  the  delta  method  and  present  asymptotic  standard  errors.  Brown 
and  Benedetti  [14]  calculate  improved  standard  error  estimates  for  the 
various  measures,  that  are  more  appropriate  studentizing  factors  for  testing 
the  null  hypothesis  that  those  measures  are  zero.  They  show  empirically 
that  their  standard  error  estimates  yield  better  approximations  to  the 
nominal  type  1  errors  in  small  and  moderate  samples  than  do  the  Goodman  and 
Kruskal  standard  error  estimates.  Furthermore,  a  very  interesting  attribute 
of  the  Brown  and  Benedetti  standard  error  estimates  is  that  even  though  each 
of  the  measures  of  association  in  general  have  different  numerical  values, 
the  "t  ratios"  formed  by  normalizing  the  measures  by  their  respective  stan¬ 
dard  errors  have  identical  values.  Thus  there  is  just  one  t-ratio  associated 
with  all  five  measures  (y,  t^,  tc,  and  two  d's).  This  t-ratio  can  thus  be 
interpreted  without  ambiguity. 

Brown  and  Benedetti  report,  based  on  a  simulation  study,  that  for  sample 
sizes  in  excess  of  100,  the  "t-ratios"  can  be  treated  as  normal  random 
variables  for  the  purpose  of  tes^'.ng  hypotheses  about  the  significance  of 
the  relation.  For  sample  sizes,  N,  less  than  50  they  recommend  comparing 
the  t-values  to  a  t-distribution  with  approximate  degrees  of  freedom  0.4N. 

See  the  Brown  and  Benedetti  paper  for  further  details. 

Feder  and  Collins  empirically  illustrate  the  increased  sensitivity 
to  monotone  alternatives,  of  the  measure  of  association  test  relative 
to  the  chi  square  test  by  applying  both  tests  to  several  sets  of  artificial 
data  constructed  to  reflect  mild,  moderate,  and  strong  trends.  In  each 
case  the  measure  of  association  test  is  much  more  highly  significant.  See 
Feder  and  Collins  for  details. 


The  measure  of  association  tests  have  been  incorporated  in  the  BMDP 
program,  BMDP1F  [7].  As  remarked  in  the  previous  subsection,  this  program 
will  not  accept  nonintegral  values  for  sample  sizes  and  numbers  of  responses. 
Thus  this  program  cannot  be  used  directly  on  the  adjusted  data  values.  An 
indirect  method  of  adjusting  the  test  procedure  for  the  presence  of  beaker 
to  beaker  heterogeneity  is  to  carry  out  the  test  on  the  original,  unadjusted 
values  and  then  modify  the  estimated  standard  error  upward  and  the  t-ratio 
downward  by  a  factor  reflecting  the  heterogeneity  adjustments.  Degrees  of 
freedom  would  be  based  on  the  degrees  of  freedom  arrived  at  in  the  adjust¬ 
ment  process.  Consider  for  example  LeBlanc's  Test  A  data.  There  were 
7  x  4  x  20  =  560  daphnids  used  in  this  test.  The  adjusted  sample  sizes, 
numbers  of  responses,  and  numbers  of  degrees  of  freedom  are  displayed  in 
Table  V.3.  The  adjusted  number  of  daphnids  is  6  x  4  x  14.8  +  4  x  1.45  =  361 
and  the  adjusted  number  of  degrees  of  freedom  is  6  x  30  +  3  =  183.  We  thus 
carry  out  the  measure  of  association  test  based  on  the  unadjusted  frequencies 
inflate  the  estimated  standard  errors  of  the  various  measures  by  the  factor 
[560/361]-*-/ 2  =  1,25  and  reduce  the  calculated  t-ratio  by  this  same  factor. 

The  resulting  value  is  compared  to  a  normal  distribution.  This  indirect 
adjustment  procedure,  while  intuitively  reasonable,  has  not  been  studied 
theoretically.  Its  theoretical  properties  are  therefore  unknown. 

We  illustrate  the  application  of  the  one  sided  measure  of  association 
test  on  the  mortality  data  from  Goulden's  test  on  isophorone.  Only  the 
responses  from  the  beakers  with  multiple  daphnids  were  used.  Since  there 
was  no  statistical  evidence  of  beaker  to  beaker  heterogeneity  within  groups, 
no  adjustments  were  carried  out.  The  responses  were  pooled  across  beakers 
and  comparisons  among  groups  were  based  on  15  daphnids  per  group.  The  test 
was  carried  out  in  a  sequential  manner,  as  discussed  in  Subsection  C.  The 
results  are  shown  in  Figures  IX. 11  -  IX. 13. 

Figure  IX. 11  displays  the  results  of  the  chi  square  and  measure  of 
association  tests  applied  to  all  the  data.  Recall  that  the  chi  square  test 
is  a  two  sided  test  whereas  the  measure  of  association  test  is  a  one  sided 
test.  The  chi  square  test  statistic  is  compared  to  a  chi  square  distribu¬ 
tion  with  5  d.f.  and  is  seen  to  be  very  highly  significant.  The  measure 
of  association  t  value  is  compared  to  a  t-distribution  with  90  x  0.4  =  36 
d.f.  The  value,  5.730,  is  significant  at  a  =  0.0000.  Thus  both  tests  show 
strong  statistical  evidence  of  a  concentration  effect. 

We  delete  group  6  from  the  data  and  recalculate  the  test  statistics. 

The  results  are  shown  in  Figure  IX. 12.  The  chi  square  test  statistic,  with 
4  d.f.,  is  significant  at  a  =  0.007.  The  measure  of  association  t-value 
is  compared  to  a  t  distribution  with  d.f.  75  x  0.4  =  30.  The  value,  2.583, 
is  significant  at  a  =  0.007.  Thus  both  tests  show  strong  statistical 
evidence  of  a  concentration  effect  and  at  about  the  same  alpha  level. 

We  now  delete  group  5  from  the  data  and  recalculate  the  test  statistics. 
The  results  are  shown  in  Figure  IX. 13.  The  chi  square  statistic,  with  3  d.f. 
is  significant  at  a  =  0.6195.  The  measure  of  association  t-value  is  compared 
to  a  t  distribution  with  d.f.  60  x  0.4  =  24.  The  value,  0.472,  is  signifi¬ 
cant  at  a  =  0.32.  Thus  neither  test  shows  any  statistical  evidence  of  a 
concentration  effect.  However  the  a-level  for  the  one  sided  test  is  much 
smaller  than  that  for  the  chi  square  test,  probably  due  to  the  increased 


mortality  in  group  4.  The  process  is  stopped  at  this  point  and  group  4  is 
declared  to  be  the  MATC  group.  In  this  example  the  chi  square  and  measure 
of  association  tests  arrive  at  the  same  conclusion. 

F.  TREATMENT  GROUP  VERSUS  CONTROL  GROUP  PAIRWISE  COMPARISONS— 

WILLIAMS'  TEST 

A  common  approach  to  pairwise  comparisons  between  the  control  group  and 
the  treatment  groups  is  with  Dunnett's  or  Williams'  procedures.  Within  each 
group  the  observed  frequencies  are  adjusted  for  beaker  to  beaker  hetero¬ 
geneity  and  then  pooled  across  beakers.  The  response  rate  is  calculated 
based  on  the  pooled  data.  For  qualitative  response  rate  data  such  as  mor¬ 
tality  rates,  an  arc  sin  variance  stabilizing  transformation  is  carried 
out  on  the  response  rate  within  each  group  and  comparisons  are  based  on 
these  transformed  values.  Dunnett's  and  Williams'  procedures  are  discussed 
in  a  number  of  references  [15,16,17,18,19].  Feder  and  Collins  [1]  discuss 
these  procedures  in  Section  XII.  Chew  [19]  briefly  describes  Williams’ 
test  and  presents  tables  for  its  implementation. 

Williams'  procedure  is  to  be  preferred  to  Dunnett’s  procedure  if  the 
mortality  rate  is  a  monotone  increasing  function  of  concentration,  as  it 
takes  account  of  this  monotonicty  and  is  thus  more  sensitive  in  detecting 
weak  to  moderate  trends.  Williams  [17],  Section  4,  compares  the  power  of 
his  test  (which  he  calls  the  T  test)  to  that  of  a  one  sided  t-test  and  to 
Dunnett's  test.  The  distribution  theory  is  sufficiently  complex,  that 
the  power  comparisons  must  be  carried  out  by  Monte  Carlo  methods  when 
there  are  three  or  more  treatment  groups.  However  Williams  concludes  on 
page  113,  based  on  the  results  of  the  Monte  Carlo  experiments,  that  "It 
is  evident. .. the  superiority  of  the  T  test  over  both  the  one  sided  t-test... 
and  Dunnett's  one-sided  test  becomes  more  marked  as  k  (the  number  of 
treatment  groups-P.F.)  increases.  There  is  no  doubt  that  the  T  test  should 
be  used  in  preference  to  these  two  tests...". 

We  illustrate  Williams'  method  with  several  examples  based  on  analysis 
of  the  21  day  mortality  data.  Consider  first  the  21  day  mortality  data 
from  LeBlanc's  Test  A  and  compare  treatment  group  responses  to  those  in 
the  water  control  group.  We  wish  to  determine  which  treatment  group  ex¬ 
hibit  significantly  greater  mortality  rates  than  the  water  control  group. 

We  adjust  the  sample  sizes  for  beaker  to  beaker  heterogeneity  within  groups, 
as  indicated  in  Table  V.3.  The  degrees  of  freedom  assumed  is  30  x  5  +  3  = 
153.  The  basic  and  transformed  responses,  pooled  across  beakers  within 
groups  are: 


Since  these  estimates  are  not  in  monotone  sequence  they  need  to  be  adjusted. 


0.79  +  0.72  +  0.60  +  0.60 
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0.68 


M,  =  1.57,  M-,  =  2.92  . 

o  / 


We  declare  the  group  i  response  rate  to  be  significantly  different  from 
the  control  rate  if 

M±  -  X1  >  t  (1/n.  +  l/n1)1/2 

The  factors  t  can  be  obtained  from  Williams'  tables  (e.g.  [18],  Tables  1  and 
3)  corresponding  to  a  =  0.05  or  a  =  0.01  and  to  v  =  153.  This  yardstick 
is  based  on  the  asymptotic  approximation  that  the  variance  of  2  arc  sin  /p 
is  1/n.  For  simplicity,  we  use  the  values  of  t  appropriate  for  equirepli- 
cated  treatment  groups,  even  though  the  effective  sample  size  in  group  6 
is  somewhat  smaller  than  those  in  the  other  groups.  We  use  the  cutoff 
points  corresponding  to  v  =  120  d.f.  and  choose  the  f's  sequentially, 
corresponding  to  the  number  of  treatment  groups. 

Group  7  versus  Group  1:  k  =  5,  M-,  -  =  2. 92  -  0.79  =  2.13, 

t  (l/n?  +  l/n1)  =  1.772  (2/59.2)  '  =  0.326. 

Thus  M7  is  significantly  greater  than  X^. 

Group  6  versus  Group  1:  k  =  4,  M,  -  X  =  1.57  -  0.79  =  0.78, 

o  i 

—  1/2 
t  (l/n6  +  l/nx)  =  1.765  (1/59.2  +  l/5.8)x/  = 

0.768.  Thus  M,  is  significanly  greater  than 
b 

X^  (but  just  barely). 

Group  5  versus  Group  1:  k  =  3,  M,.  -  X^  =  0.68  -  0.79  <  0.  Thus  M,. 

is  not  significantly  greater  than  X  . 

We  thus  stop  the  process  and  declare  group  5  to  be  the  MATC  group.  This 
result  agrees  with  that  previously  arrived  at  based  on  the  chi  square 
test  for  homogeneity. 

We  apply  this  same  procedure  to  the  21  day  mortality  data  from  LeBlanc’s 
Test  A,  but  comparing  to  the  solvent  control  group  rather  than  the  water 
control  group.  We  again  adjust  the  sample  sizes  based  on  the  results  in 
Table  V.3.  The  basic  and  transformed  responses,  pooled  across  beakers 
within  groups  are: 
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Group  (i) _ 2(Control) _ 3 _ 4 _ 5 _ 6 


Sample  Size  (n 

i> 

59.2 

59.2 

59.2 

59.2 

5.8 

59.2 

Response  Rate 

(Pi) 

0.037 

0.125 

0.086 

0.088 

0.50 

0.988 

2  Arc  Sin  /p . 

l 

=  x. 

1 

0.388 

0.72 

0.60 

0.60 

1.57 

2.92 

M. 

l 

0.388 

0.64 

0.64 

0.64 

1.57 

2.92 

M.  -  X, 

l  1 

0.25 

0.25 

0.25 

1.18 

2.53 

Since  the  X.'s  are  not  in  monotone  sequence  they  need  to  be  adjusted  to  the 
M^'s.  The  comparisons  and  yardsticks  used  are  directly  analagous  to  those 
discussed  previously. 

Group  7  versus  Group  2:  k  =  5,  t  (l/n^  +  1/n =  1*  772  (2/59. 2)^^  = 

0.326  <  -  X^.  Thus  is  significantly 

greater  than  M2* 

—  1/2 

Group  6  versus  Group  2:  k  =  4,  t  (1/n^  +  l/r^) 

1.765  (1/59.2  +  1/5.8)1/2  =  0.768  <  M&  -  X^  T 
M^.  is  significantly  greater  than  M2* 

—  1/2  1/2 

Group  5  versus  Group  2:  k  =  3,  t  (1/n,  +  l/n„)  =  1.754  (2/59.2)  = 

O  L 

0.322  <  M,.  -  X^.  Thus  is  not  significantly 
different  from  M^. 

We  thus  stop  the  process  and  declare  group  5  to  be  the  MATC  group.  This 
result  agrees  with  that  arrived  at  above  based  on  comparisons  with  the 
water  control  group.  It  also  agrees  with  the  MATC  arrived  at  based  on  the 
chi  square  test  for  homogeneity. 

We  now  apply  Williams'  procedure  to  the  21  day  mortality  data  from 
LeBlanc's  Test  B,  based  on  comparisons  to  the  combined  solvent  and  water 
control  groups.  We  adjust  the  sample  sizes  based  on  the  results  in 
Table  V.4.  The  basic  and  transformed  responses,  pooled  across  beakers 
within  groups  are: 


Group  (i) 

O(Control) 

3 

4 

5 

6 

7 

Sample  Size  (n 

i> 

115.2 

57.6 

57.6 

57.6 

57.6 

8.96 

Response  Rate 

(P^ 

0.13 

0.049 

0.063 

0.163 

0.099 

0.626 

2  Arc  Sin  /p. 

=  X. 

1 

0.74 

0.44 

0.51 

0.83 

0.64 

1.826 

M. 

1 

0.608 

0.608 

0.608 

0.735 

0.735 

1.826 

M,  -  X„ 

-0.132 

-0.132 

-0.005 

-0.005 

1.086 

x  0 


We  use  the  t  values  appropriate  for  equireplicated  treatment  groups,  even 

though  the  effective  sample  size  in  group  7  is  smaller  than  those  in  the 

other  groups.  Since  the  control  group  sample  size  is  at  least  twice  that 
of  any  of  the  treatment  groups,  we  utilize  the  adjustment  suggested  by 
Williams  [18,  Section  2]  to  account  for  increased  control  group  replica¬ 
tion.  Namely  let  c  denote  the  control  group  sample  size,  let  r  denote 
the  average  treatment  group  sample  size,  and  let  w  =  c/r.  In  our  example 
c  =  115.2,  r  =  47.87,  w  =  2.41.  Williams  recommends  adjusting  t  downward 
to  t  -  10“2  6(1  -  1/w) ,  where  values  of  8  are  given  in  Table  1,  corresponding 

to  k  and  v.  We  use  v  =  120  d.f. 

—  1/2 

Group  7  versus  Control:  k  =  5,  t  (1/n^  +  1/n^)  = 

1.743  (1/8.96  +  1/115. 2)1/2  =  0.605  <  M?  =  XQ. 
Thus  is  significantly  greater  than  M^. 

Since  all  the  other  M.'s  are  less  than  Xq,  we  conclude  that  M5  is  not  signi¬ 
ficant  different  from  Mg.  We  thus  stop  the  process  and  declare  group  6  to 
be  the  MATC  group.  This  result  agrees  with  that  previously  arrived  at 
based  on  the  chi  square  test  for  homogeneity. 

As  the  final  example  in  this  set  we  apply  Williams'  procedure  to  the  21 
day  mortality  from  Goulden's  Isophorone  Test,  using  the  data  from  the 
multiply  housed  daphnids.  There  was  no  evidence  of  beaker  to  beaker  hetero¬ 
geneity  so  we  carry  out  comparisons  based  on  the  unadjusted  sample  sizes. 

The  basic  and  transformed  responses,  pooled  across  beakers  within  groups 
are: 
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Group  (i) 

l(Control) 

2 

3 

4 

5 

6 

Sample  Size  (n_^) 

15.0 

15.0 

15.0 

15.0 

15.0 

15.0 

Response  Rate  (p^) 

0.133 

0.067 

0.067 

0.200 

0.533 

0.86 

2  Arc  Sin  /p.  E  X. 

*i  l 

0.75 

0.52 

0.52 

0.93 

1.64 

2.39 

M. 

l 

0.60 

0.60 

0.60 

0.93 

1.64 

2.39 

M.  -  X. 

l  1 

-0.15 

-0.15 

-0.15 

0.18 

0.89 

1.64 

We  use  the  t  values  appropriate  for  equireplicated  treatment  groups  and 
v  =  14  x  6  =  84  d.f. 

—  1/2  1/2 
Group  6  versus  Control:  k  =  5,  t  (1/n^.  +  1/n^)  =  1.79  (2/15) 

0.650  <  -  X^.  This  is  significantly 

greater  than  M^. 

Group  5  versus  Control:  k  =  4,  t  (1/n,.  +  l/n^)^^  =  1.773  (2/15)^2  = 

0.647  <  M,.  -  X^.  Thus  is  significantly 

greater  than  . 

—  1/2  1/2 
Group  4  versus  Control:  k  =  3,  t  (1/n^  +  1/n-^)  =  1.762  (2/15)  = 

0.643  <  -  Xr  Thus  is  not  significantly 

greater  than  M  . 

We  thus  stop  the  process  and  declare  group  4  to  be  the  MATC  group.  This 
result  agrees  with  that  previously  arrived  at  based  on  the  chi  square  test 
for  homogeneity. 

In  summary  in  all  the  examples  we  have  considered,  the  chi  square  test 
and  Williams'  test  lead  to  the  same  conclusions  about  the  MATC.  This  is 
because  there  is  generally  either  low  mortality  or  high  mortality  observed, 
with  few  groups  falling  in  the  middle  of  the  dose  response  curve.  In  such 
cases  there  are  no  borderline  situations  and  so  most  reasonable  test  pro¬ 
cedures  will  arrive  at  the  same  conclusion.  This  situation  of  course,  does 
not  hold  in  general. 
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G.  ONE  SIDED  COMPARISONS  BASED  ON  THE  COCHRAN-ARMITAGE  TEST 

In  the  previous  subsections  of  this  section  we  have  considered  several 
procedures  to  test  for  the  presence  of  statistically  significant  differences 
between  treatment  group  and  control  group  mortality  rates.  In  Subsection  D 
we  considered  the  overall,  analysis  of  variance  type  Pearson  chi  square  test. 
In  Subsections  E  and  F  we  considered  one  sided  tests  that  are  designed  to 
be  more  sensitive  to  monotone  alternatives.  In  this  subsection  we  consider 
a  generalization  of  the  chi  square  test,  due  to  Cochran  and  Armitage,  that 
is  also  more  sensitive  to  one  sided,  monotone  alternatives. 

The  Cochran-Armitage  test  is  appropriate  when  the  experimental  groups 
possess  an  intrinsic  ordering,  such  as  is  the  case  in  aquatic  toxicity 
tests.  A  score  is  attached  to  each  group,  so  that  an  ordered  scale  is 
created.  These  scores  are  treated  as  predictor  variables  and  the  null 
hypothesis  of  equal  response  probabilities  across  groups  is  tested  against 
an  alternative  of  some  type  of  trend  of  response  probabilities  with  in¬ 
creasing  score.  Let  Xq,  X-p  X2,...,  X^  denote  the  scores  assigned  to  the 
control  group  and  the  k  treatment  groups.  Let  Pq,  pp . . . ,  p^  denote  the 
mortality  probabilities  in  these  groups.  Then  the  hypothesis  can  be 
expressed  as 


V  p0  =  P1  =  **•  =  pk 

versus 

Hp  Pi  =  e0  +  bl  x.  +  e2  x^  +  ...  +  er  xr 

Cochran  and  Armitage  chose  the  simplest  type  of  trend,  a  straight  line  trend, 
to  test  against  but  this  test  can  be  extended  to  polynomials  of  higher  order. 
Thus  we  will  test  Hq  against  the  alternative  hypothesis 


This  test  has  been  described  and  illustrated  in  a  number  of  references. 

Snedecor  and  Cochran  [20],  Section  9.11,  Steel  and  Torrie  [21],  Section  22.10, 
Fleiss  [22],  Section  9.2,  Cochran  [23],  and  Armitage  [24]  discuss  this  pro¬ 
cedure  in  some  detail.  In  essence  the  homogeneity  chi  square  with  k  degrees 
of  freedom  is  partitioned  into  a  single  degree  of  freedom  component  to  test 
for  a  straight  line  trend  and  a  residual  component  with  k-1  degrees  of 
freedom  to  test  for  departures  from  linearity.  The  residual  component  can 
be  further  decomposed,  if  desired,  into  quadratic  components,  cubic  compon¬ 
ents,  etc.  Expressions  for  the  decomposition  are  given  by  Fleiss  [22], 

Section  9.2,  Equations  (9.17)  -  (9.26). 

The  Cochran-Armitage  procedure  can  be  carried  out  rather  easily  utilizing 
any  linear  regression  computer  program  with  capabilities  of  performing  weighted 
least  squares  fits.  Namely  the  regression  model 

pi  =  +  Bx  Xt  +  e1  e1  ^  N(0,  pq/ni) 


v„. 
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is  fitted  to  the  data.  The  p^'s  are  the  observed  mortality  rates  based  on 
an  effective  sample  size  of  n^  daphnids.  The  e-^'s  are  independent  variables, 
representing  the  random  variation  with  mean  0  and  variance  pq/n^.  The  prob¬ 
ability  p  is  the  common  value  of  all  the  p^'s  under  the  null  hypothesis 
and  q  =  1-p.  It  is  estimated  by  the  total  (effective)  number  of  deaths 
among  all  the  groups  divided  by  the  total  (effective)  number  of  daphnids  in 
all  the  groups.  Thus  the  weight,  w^,  for  the  i-th  group  is  n^/pq. 

This  procedure  was  carried  out  using  the  BMDP  polynomial  regression 
computer  program,  BMDP5R.  See  Dixon  and  Brown  [7]  for  a  detailed  descrip¬ 
tion  of  this  program.  Cubic  polynomials  were  fitted  rather  then  straight 
lines,  however  the  Cochran-Armitage  test  can  be  carried  out  based  on  this 
output.  The  tests  were  carried  out  sequentially. 

The  chief  objection  to  the  Cochran-Armitage  test  and  its  generalizations 
is  the  somewhat  arbitrary  assignment  of  scores.  However  Snedecor  and 
Cochran  [20],  page  246,  report  that  moderate  differences  in  the  conclu- 
systems  usually  would  not  produce  substantial  differences  in  the  conclu¬ 
sions  from  the  analyses.  We  have  used  two  sets  of  scores  that  seem 
natural-namely  group  indexes  and  logarithmic  concentration.  Since  the 
concentrations  were  selected  to  be  approximately  equally  spaced  in  the 
log  domain,  one  would  expect  approximately  the  same  results  with  both  sets 
of  scores. 

We  present  below  applications  of  the  Cochran-Armitage  test  to  the 
mortality  responses  in  LeBlanc's  Tests  A  and  B  and  in  Goulden’s  Isophorone 
Test.  In  LeBlanc's  Test  A  we  carried  out  separate  comparisons,  using  the 
water  control  and  using  the  solvent  control  groups  as  standards. 


Gl.  LeBlanc  Test  A  -  Comparison  With  Water  Control  Grou 


are  Group  Indices 


Scores 


We  adjust  for  the  effects  of  beaker  to  beaker  heterogeneity  within  groups 
by  utilizing  the  sample  sizes  and  numbers  of  responses  shown  in  Table  V.3. 

The  estimate  of  p  is  88.1/301.8  =  0.292.  The  n^'s  are  59.2  except  for 
group  6  and  ng  =  5.8.  The  scores  assigned  to  the  groups  are  X^  =  1, 

X3  =  2,  X4  =  3,  X5  =  4,  Xg  =  5,  and  X7  =  6.  Group  2,  the  solvent  control 
group,  is  excluded  from  the  analysis. 

We  fitted  a  cubic  polynomial  with  weighted  least  squares  using  weights 
wi  =  ni/P9>  as  described  above.  The  output  showing  the  details  of  the 
straight  line  submodel  fit  and  goodness  of  fit  tests  for  the  linear, 
quadratic,  cubic,  and  departures  from  cubic  models  is  displayed  in 
Figur<  IX. 14.  The  estimate,  8^,  of  the  slope  B,  is  0.15720  and  appears 
at  the  upper  right  of  the  figure.  A  standard  error  estimate,  0.06390, 
appears  to  the  right  of  8,  but  this  value  must  be  modified  before  being 
used.  Namely  the  model  fitted  by  the  weighted  least  squares  program  is 

Pj_  =  &q  +  X.  +  ei  ^  N(0,  o2  wt) 
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Under  binomial  theory,  a  =1.  However  the  algorithm  estimates  a  by  the 
residual  mean  square  error,  namely  a ^  =  17.66954  and  incorporates  o  in  all 
the  standard  error  estimates  and  thus  in  all  the  t -ratios. ,,  We  normalize 
the  standard  error  estimate  back  to  a=l  by  dividing  it  by  a.  Thus,  under 
binomial  theory 

*  1/2 
std  err  (6^  =  0.06390/(17.66954)  '  =  0.01520  . 

The  appropriate  "t-value"  is 

t  =  0.15720/0.01520  =  10.341  . 

This  value  can  also  be  calculated  by  multiplying  the  stated  t-value  by  a. 
Namely  (2.46) (17.66954)1/2  =  10.341.  To  test  HQ:  =  0  versus  H^:  6^  >  0 

we  compare  t  with  the  percentiles  of  a  standard  normal  distribution.  A 
one  tailed  test  is  appropriate.  Thus  8^  is  significant  at  the  a  =  0.0000 
level  and  so  we  strongly  reject  Hq. 

The  sums  of  squares  at  the  bottom  of  Figure  IX. 14  can  also  be  used  to 
test  hypotheses  about  the  homogeneity  or  trend  in  the  p^'s.  The  sum  of 
squares  corresponding  to  degree  0  represents  the  variability  explained  by 
fitting  a  cubic  polynomial  to  the  p^'s  over  and  above  a  constant  term. 

(Note  that  the  entires  in  the  column  labeled  "Sum  of  Squares"  are  to  be  used 
for  inferences  rather  than  the  entries  in  the  column  labeled  "F"  since  under 
binomial  theory  the  residual  mean  square  is  known  to  be  1).  The  sums  of 
squares  corresponding  to  degrees  1  and  2  represent  the  variability  explained 
by  fitting  a  cubic  polynomial  to  the  Pi's  over  and  above  linear  and  quadratic 
polynomials,  respectively.  The  residual  sum  of  squares  represents  departures 
from  a  cubic  polynomial.  These  sums  of  squares  can  be  used  to  form  various 
test  statistics: 

Degree  0  +  Resid  =  176.964  +  0.670  =  177.634  with  3+2=5  d.f. 

This  represents  the  deviation  from  the  model  of  homogeneous  probabilities. 
That  is,  it  is  the  usual  Pearson  chi  square  statistic.  We  see  that  this 
agrees  with  the  chi  square  value  obtained  in  subsection  D1  and  is  highly 
significant . 

Degree  0  -  Degree  1  =  176.96388  -  70.00806  =  106.956  with  3-2=1  d.f. 

This  represents  the  variation  explained  by  a  linear  trend  term  in  the  p^'s. 
This  sum  of  squares  is  compared  to  the  percentiles  of  a  chi  square  distri¬ 
bution  with  1  d.f.  and  is  highly  significant.  This  test  for  linear  trend 
is  essentially  equivalent  to  the  Cochran-Armitage  test.  Note  that 
(106.956) 1/2  -  10.342,  which  agrees  with  the  t  value  calculated  previously, 
except  that  it  is  a  two  tailed  test  rather  than  a  one  tailed  test. 

Degree  1  +  Resid  =  70.00806  +  0.67011  =  70.678  with  2+2=4  d.f. 


Hi  is  represents  deviation  from  the  model  of  straight  line  trends.  It  is 
►tighiv  significant.  There  is  thus  evidence  of  departure  from  a  straight 
trend.  The  above  decomposition  is  that  which  is  usually  referred  to 


X  t  S  .»H 


>2 
tot 


+  X 


slope  linearity’ 


Degree  2  +  Resid  =  5.96232  +  0.67011  =  6.632 


with  1  +  2  =  3  d.f. 


This  represents  deviations  from  a  quadratic  model.  It  is  significant  at 
a  =  0.08  and  so  represents  just  marginal  suggestion  of  departure  from  a 
quadratic  trend.  There  is  no  evidence  of  departure  from  a  cubic  trend. 

Since  the  Cochran-Armitage  test  is  highly  significant  using  all  the 
groups,  we  delete  group  7  from  the  data  and  recalculate  the  test  statistics 
The  output  appears  in  Figure  IX. 15.  The  meanings  of  the  various  estimates 
are  directly  analogous  to  those  in  the  previous  figure  and  so  need  not  be 
explained  again  in  detail.  The  weights  need  to  be  recalculated  because 
p  is  different  than  before. 

A 

„  The  slope  estimate,  8q,  is  now  negative  and  very  close  to  0;  namely 
6^  =  -0.00353.  The  standard  error  is  0.03189/(3.14541)1/2  -  0.0180.  Thus 
the  t-value  is 


t  =  -0.00353/0.0180  =  -0.196 


which  is  nonsignificant. 

The  decomposition  of  the  Pearson  chi  square  statistic  is 

X2  =  8.733  +  0.742  =  9.475  with  4  d.f. 

Atot 

X2.  =  8.733  -  8.695  =  0.038  with  1  d.f. 

Aslope 

X2  .  =  8.695  +  0.742  =  9.437  with  3  d.f. 

linearity 

The  total  chi  square  is  marginally  significant  (a  =  0.05).  The  slope  chi 
square  is  not  significant.  The  departure  from  linearity  chi  square  is 
significant  (a  =  0.024).  The  departure  from  quadratic  chi  square  is  also 
marginal  (a  =  0.07).  We  thus  conclude  that  there  is  some  statistical 
evidence  of  departure  from  homogeneity,  but  the  heterogeneity  is  not  linear 
We  see  from  Figure  II. 1  that  the  mortality  rate  is  about  constant  in  groups 
1-5  and  then  increases  in  group  6.  This  is  not  a  straight  line  trend, 
especially  since  the  group  6  responses  are  discounted  so  heavily  relative 
to  those  in  the  other  groups. 

Since  the  overall  and  the  departure  from  linearity  chi  squares  are 
significant,  we  delete  group  6  from  the  data  and  recalculate  the  test 
statistics.  The  output  appears  in  Figure  IX. 16. 

The  slope  estimate  is  again  negative  and  very  close  to  0;  namely  8^  = 
-0.02247.  The  standard  error  is  0. 00530/(0. 08299)1/2  =  0.0184.  Thus  the 
t-value  is 


t  =  -0.02247/0.0184  =  -1.221 

which  is  significant.  (It  is  almost  significant  in  the  wrong  direction.) 
The  total  chi  square  is  nonsignificant  (a  =  0.35),  the  slope  chi  square  is 


not  significant  (a  =  0.22),  and  the  departure  from  linearity  chi  square  is 
not  significant  (a  =  0.92).  The  process  is  thus  stopped  at  this  point  and 
group  5  is  declared  to  be  the  MATC  group.  This  agrees  with  the  results 
from  the  chi  square  test  and  from  Williams'  test. 

The  tests  that  were  carried  out  in  the  previous  analyses  were  based  on 
the  normal  and  the  chi  square  distributions.  A  slight  refinement  of  this 
procedure  would  be  to  base  these  inferences  on  the  t  and  F  distributions, 
using  denominator  degrees  of  freedom  calculated  from  Table  V.3.  These 
degrees  of  freedom  are  153,  123,  and  120  for  the  three  sets  of  comparisons 
They  are  sufficiently  large  that  use  of  the  normal  and  chi  square  distri¬ 
butions  makes  little  difference. 

The  Cochran-Armitage  test  was  recalculated  using  the  solvent  control 
group  for  comparison  instead  of  the  water  control  group.  The  results  were 
similar  to  those  above  and  are  not  shown.  Group  5  was  declared  to  be  the 
MATC  group.  The  only  difference  in  outcomes  was  that  the  Cochran-Armitage 
test  was  significant  (a  =  0.04)  after  the  data  from  group  7  were  deleted. 

G2.  LeBlanc  Test  B  -  Comnarison  With  Combined  Water  and  Solvent  Control 


Groups  -  Scores  are  Group  Indices 


We  adjust  for  the  effects  of  beaker  to  beaker  heterogeneity  within  groups 
by  using  the  sample  sizes  and  numbers  of  responses  shown  in  Table  V.4. 

Since  there  was  no  statistically  significant  difference  in  the  mortality 
rates  in  the  solvent  and  water  control  groups,  the  responses  in  these  two 
groups  were  combined  for  comparison  with  the  treatment  groups.  The  estimate 
of  p  for  all  the  groups  is  42.5/354.54  =  0.120.  The  n^'s  are  57.6  except 
for  group  7  and  n 7  =  8.96.  The  scores  assigned  to  the  groups  are  X3=l, 

X2=l,  X3=2,  X4=3,  X5=4,  X6=5  and  Xy=6. 

The  models  fitted  are  the  same  as  those  discussed  for  Test  A,  namely 
cubic  polynomials  in  the  scores.  The  interpretations  of  the  computer 
printouts  are  also  the  same  as  those  for  Test  A  and  so  need  not  be  explained 
in  detail. 

The  output  showing  the  details  of  the  straight  line  submodel  fit  and  good 
ness  of  fit  tests  for  the  linear,  quadratic,  cubic,  and  departures  from 
cubic  models  is  displayed  in  Figure  IX.  17.  The  slope  estimate  is  B]_  = 
0.01744.  The  estimated  standard  error  is  0.02761/(6.24926)1/2  =  0.0110. 

Thus  the  t-value  is 

t  =  0.01744/0.0110  =  1.579 

Comparing  t  with  the  precentiles  of  the  standard  i.~ rmal  distribution,  we 
see  that  t  is  marginally  significant  at  (the  one  tailed)  a  =  0.057. 

The  decomposition  of  the  Pearson  chi  square  statistic  is 


X2  =  15.394  +  12.097  =  27.491 
tot 


with  3  +  2  =  5  d.f. 


X2.  =  15.394  -  12.900  =  2.494  with  1  d.f. 

slope 

X2.  .  =  12.9000  +  12.097  =  24.997  with  2+2=4  d.f. 

linearity 

The  total  chi  square  and  departure  from  linearity  chi  square  are  not  highly 
significant.  The  slope  chi  square  is  at  best  borderline  (a  =  0.11),  re¬ 
flecting  the  marginally  significant  slope  discussed  above.  Thus  there  is 
marginal  statistical  evidence  of  departures  from  homogeneity  of  mortality 
rates  across  groups.  The  sums  of  squares  in  Figure  IX. 17  suggest  that  the 
trends  in  the  p^'s  are  reflected  mostly  in  the  quadratic  component  and  in 
departures  from  a  cubic  polynomial  (e.g.  fourth  degree  term?).  We  see  from 
Figure  II. 3  that  this  behavior  is  due  to  the  substantial  increase  in  mortality 
rate  in  group  7  after  being  relatively  constant  in  groups  1-6.  This  re¬ 
sembles  either  quadratic  or  quartic  trend. 

Since  the  Cochran-Armitage  statistic  is  marginally  significant,  the 
departure  from  linearity  chi  square  is  highly  significant,  and  Figure  II. 3 
reveals  a  sharp  upward  trend  in  mortality  rates  at  group  7  we  continue  the 
process.  We  delete  group  7  from  the  data  and  recalculate  the  test  statistics. 
The  output  appears  in  Figure  IX. 18.  Note  that  the  weights  for  this  analysis 
differ  from  those  used  before,  since  the  estimate  of  p,  based  on  groups  1-6 
is  36.9/345.6  =  0.107.  The  models  fitted  and  the  interpretations  of  the 
computer  printouts  are  the  same  as  before.  The  slope  estimate  is  §.  = 

0.0000.  The  estimated  standard  error  is  0.01552/(1.939)1/2  =  0.011.  The 
t-value  is  of  course  0.  Thus  there  is  no  linear  trend  whatsoever  in  the 
mortality  rates  in  groups  1-6. 

The  decomposition  of  the  Pearson  chi  square  statistic  is 

X2  _  =  5.28888  +  0.52815  =  5.817  with  4  d.f. 

tot 

X2  =  5.28888  -  5.28888  =  0  with  1  d.f. 

slope 

X2.  =  5.28888  +  0.52815  =  5.817  with  3  d.f. 

linearity 

The  total  chi  square  and  slope  chi  square  statistics  are  each  nonsignificant 
(a  =  0.21  and  ot  =  1.0  respectively).  The  departure  from  linearity  chi 
square  is  perhaps  marginal  (a  =  0.12).  Breaking  this  chi  square  up  into 
quadratic,  cubic,  and  residual  terms  we  see  that  the  cubic  component  domi¬ 
nates  the  other  two  and  is  significant  at  the  a  =  0.04  level.  This  a 
marginal  level  of  significance  and  might  simply  be  the  result  of  selection. 
Since  there  is  a  nonsignificant  total  chi  square,  no  linear  component  to 
the  trend,  a  nonsignificant  quadratic  component  (a  =  0.28),  a  marginal 
cubic  component,  and  no  monotone  trend  evident  in  Figure  II. 3,  we  conclude 
that  there  is  no  discernible  upward  trend  in  mortality  rates  in  groups  1-6 
and  we  stop  the  process.  Group  6  is  declared  to  be  the  MATC  group.  This 
agrees  with  the  conclusions  from  the  chi  square  test  and  from  Williams'  test. 
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G3.  LeBlanc  Test  B  -  Comparison  With  Combined  Water  and  Solvent  Control 
Groups  -  Scores  are  Log  (Concentration) 

We  repeat  the  analyses  carried  out  in  paragraph  G2  above,  but  basing 
group  scores  on  log  (concentration)  rather  than  on  group  indices.  Other¬ 
wise  everything  else  stays  the  same.  The  aim  of  this  reanalysis  is  to 
determine  if  the  use  of  these  two  different  sets  of  scores  leads  to  different 
conclusions. 

The  output  showing  the  details  of  the  straight  line  submodel  fit  and 
goodness  of  fit  tests  for  the  linear,  quadratic,  cubic,  and  departures  from 
cubic  models  is  displayed  in  Figure  IX. 19.  The  slope  estimate  is  3^  = 
0.00743.  The  estimated  standard  error  is  0. 04346/(6. 82295)1/^  =  0.0166. 

Thus  the  t-value  is 


t  =  0.00743/0.0166  =  0.447 

This  value  of  t  is  nonsignificant  (a  =  0.33)  based  on  the  upper  tail  of  the 
normal  distribution. 

The  decomposition  of  the  Pearson  chi  square  statistic  is 

xl  .  =  20.171  +  7.320  =  27.491  with  3+2=5  d.f. 

tot 

X2,  =  20.171  -  19.972  =  0.199  with  1  d.f. 

slope 

X2  =  19.972  +  7.320  =  27.292  with  4  d.f. 

The  total  chi  square  and  departure  from  linearity  chi  square  are  each  highly 
significant.  The  slope  chi  square  is  nonsignificant  (a  =  0.66).  Comparing 
Figures  II. 3  and  II. 4  we  see  that  the  linear  trend  is  steeper  when  scores 
are  based  on  indices  rather  than  log  (concentration)  since  the  control  group 
is  then  much  less  separated  from  the  treatment  groups.  Based  on  the  sums 
of  squares  at  the  bottom  of  Figure  IX. 19  we  see  that  the  major  components 
of  the  departure  from  linearity  chi  square  are  the  quadratic  (x^  =  14.82) 
and  departure  from  cubic  (x^  =  7.32).  We  see  from  Figure  II. 4  that  the 
trend  in  mortality  rates  with  increasing  concentration  resembles  a  quadratic 
or  a  quartic  curve. 

Since  the  departure  from  linearity  chi  square  is  highly  significant 
and  Figure  II. 4  reveals  a  sharply  upward  trend  in  mortality  rates  at  group  7 
we  continue  the  process.  We  delete  group  7  from  the  data  and  recalculate 
the  test  statistics.  The  output  appears  in  Figure  IX. 20.  The  slope  estimate 
is  now  negative,  §]_  =  -0.01111.  The  estimated  standard  error  is  a  =  0.02170/ 
(1.78317)1/2  =  0.016.  The  t-value  is 

t  =  -0.01111/0.016  =  -0.684 


This  is  nonsignificant  (a  =  0.75  is  the  one  tailed  level).  Thus  there  is 
no  statistical  evidence  of  linear  trend  in  the  mortality  rates  in  groups  1-6. 


The  decomposition  of  the  Pearson  chi  square  statistic  is 
2 


Xtot  =  5-817 


Xslope  =  °'467 


XT .  =  5.350 

linearity 


with  4  d.f. 

with  1  d.f. 
with  3  d.f. 


The  total,  slope,  and  departure  from  linearity  chi  squares  are  each  non¬ 
significant  (a  =  0.21,  a  =  0.49,  and  a  =  0.15  respectively).  The  largest 
component  of  chi  square  is  the  quadratic  component  (X2  =  2.524)  which  is 
at  best  marginal  (a  =  0.11).  Furthermore  no  monotone  trend  is  evident  in 
Figure  II. 4.  We  conclude  that  there  is  no  discernible  upward  trend  in  mor¬ 
tality  rates  in  groups  1-6  and  we  stop  the  process.  Group  6  is  declared  to 
be  the  MATC  group.  This  agrees  with  the  conclusions  from  the  chi  square 
test,  from  Williams'  test,  and  from  the  Cochran-Armitage  test  using  group 
indices  as  scores. 


G4.  Goulden  Isophorone  Test  -  Scores  are  Group  Indices 

We  consider  a  second  data  set  to  compare  the  results  of  applying  the 
Cochran-Armitage  test  with  scores  based  on  group  indices  to  the  results 
with  scores  based  on  log  concentration.  We  confine  attention  to  the  three 
beakers  per  group  containing  multiple  daphnids  (5  daphnids  per  beaker) . 

Since  there  is  no  evidence  of  beaker  to  beaker  heterogeneity  within  groups, 
no  adjustments  of  sample  sizes  were  carried  out.  There  is  just  one  control 
group . 

The  details  of  the  straight  line  submodel  fit  and  goodness  of  fit  tests 
are  displayed  in  Figure  IX. 21.  The  interpretation  of  the  entries  in  the 
output  are  the  same  as  those  discussed  previously  for  the  LeBlanc  data. 

The  slope  estimate  is  =  0.1486  and  its  estimated  standard  error  is 
0. 04395/(2. 36554)1/2  =  0.0286.  Thus  the  t-value  is 

t  =  0.1486/0.0286  =  5.200 

This  value  is  very  highly  significant,  based  on  the  upper  tail  of  the  normal 
distribution.  The  goodness  of  fit  chi  squares  reflect  the  highly  statisti¬ 
cally  significant  linear  trend  component,  as  well  as  significant  (a  =  0.01) 
departures  from  linearity.  The  trend  in  mortality  rates  is  clearly  evident 
in  Figure  II. 8. 

Since  the  Cochran-Armitage  statistic  and  the  depature  from  linearity 
chi  square  are  both  highly  significant  and  Figure  II. 8  reveals  a  trend  in 
mortality  rates,  we  continue  the  process.  We  delete  group  6  from  the  data 
and  recalculate  the  test  statistics.  The  output  appears  in  Figure  IX. 22. 

The  slope  has  been  reduced  from  0.1486  to  0.0933  but  this  is  still  highly 
significant.  The  t-value  is  2.857,  which  is  significant  at  the  a  =  0.002 
level.  The  departure  from  linearity  chi  square  is  marginal  (a  =  0.11). 
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Since  the  Coc hran-Armitage  statistic  is  significant  and  Figure  II. 8 
reveals  a  trend  even  after  deleting  group  6,  we  continue  the  process.  We 
delete  group  5  from  the  data  and  recalculate  the  test  statistics.  The 
slope  estimate  has  now  been  reduced  to  0.02  and  is  no  longer  significant 
(a  =  0.29).  The  departure  from  linearity  chi  square  is  also  nonsignificant, 
(a  =  0.49).  We  thus  stop  the  process  and  declare  group  4  to  be  the  MATC 
group.  This  agrees  with  the  results  of  the  chi  square  test,  the  measure  of 
association  test,  and  Williams'  test. 

G5.  Goulden  Isophorone  Test  -  Scores  are  Log  (Concentration) 

We  repeat  the  analyses  carried  out  in  paragraph  G4  above  using  the  same 
data  but  with  scores  based  on  log  (concentration) .  In  particular  since 
the  control  group  is  at  (nominal)  concentration  0,  we  define  the  scores 
to  be  log  (1  +  concentration) . 

The  computer  outputs  corresponding  to  all  the  groups,  group  6  omitted, 
and  groups  5  and  6  omitted  are  displayed  in  Figure  IX. 24,  25,  and  26 
respectively.  Based  on  all  the  data,  both  the  Cochran-Armitage  test  and 
the  departure  from  linearity  chi  square  test  are  significant  (a  =  0.01, 
a  =  0.0000  respectively).  After  deleting  group  6,  both  these  statistics 
are  still  significant  (a  =  0.025,  a  =  0.0002  respectively).  After 
deleting  groups  5  and  6,  neither  of  these  statistics  are  significant 
(a  =  0.41,  a  =  0.18  respectively).  We  thus  stop  the  process  at  this  point 
and  declare  group  4  to  be  the  MATC  group.  This  result  agrees  with  that 
obtained  based  on  using  the  group  indices  as  scores  and  with  those  obtained 
based  on  the  chi  square,  measure  of  association,  and  Williams’  tests. 

In  summary  we  have  illustrated  the  Cochran-Armitage  test  and  generaliza¬ 
tions  on  three  data  sets.  On  two  of  the  data  sets  we  used  scores  based 
both  on  group  indices  and  on  log  (concentration) .  While  there  were  some 
relatively  minor  differences  in  detailed  results,  both  sets  of  scores  led 
to  the  same  conclusion  about  the  MATC  group  in  both  data  sets.  This  suggests 
that  the  results  of  the  Cochran-Armitage  procedure  are  not  too  sensitive  to 
moderate  differences  in  scores,  such  as  in  the  two  sets  we  used.  Thus  the 
Cochran-Armitage  test  appears  to  be  a  reasonable  procedure  to  use  for  the 
comparison  of  mortality  rates  in  chronic  daphnia  tests.  This  matter  needs 
further  empirical  or  theoretical  study  utilizing  a  number  of  other  data 
sets  and  choices  of  scores. 
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Figure  IX. 4.  Chi  square  test  of  homogeneity  across  groups  -  LeBlanc  Test  A  -  21  day 
mortality  -  solvent  control  group  -  group  7  omitted  -  EX AX 2  output. 


ONTROL  GROUP  (1),  1  SOLVENT  GROJP 


Figure  IX. 5.  Chi  square  test  of  homogeneity  across  groups  -  LeBlanc  test  A  -  21  day 

mortality  -  solvent  control  group  -  groups  6  and  7  omitted  -  EXAX2  output. 


Figure-  IX. 7.  Chi  square  test  o£  homogeneity  across  groups  -  LeBlanc  test  B  -  21  day  mortality 
solvent  and  water  control  groups  combined  -  group  7  omitted  -EXAX2  output 
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Figure  IX. 8.  Chi  square  test  of  homogeneity  across  groups  -  Goulden  isophorone  test  -  21  day 
mortality  -  multiply  housed  daphnids  -  EXAX2  output. 
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Figure  IX. 9.  Chi  square  test  of  homogeneity  across  groups  -  Goulden  isophorone  test  -  21  day 
mortality  -  multiply  housed  daphnids  -  EXAX2  output  -  group  6  omitted. 
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Figure  IX. 10.  Chi  square  test  of  homogeneity  across  groups  -  Goulden  isophorone  test  -  21  day 
mortality  -  multiply  housed  daphnids  -  EXAX2  output  -  groups  5  and  6  omitted. 
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BMDP1F  output  from  one  sided  measure  of  association  test  -  Goulden 
isophorone  test  -  21  day  mortality. 
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Figure  IX. 13.  BMDP1F  output  from  one  sided  measure  of  association  test  -  Goulden 
isophorone  test  -  21  day  mortality  -  groups  5  and  6  omitted. 
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Figure  IX. 14.  Cochran-Armitage  test  of  homogeneity  across  groups  -  LeBlanc  test 
mortality  -  water  control  group.  Scores  are  group  indices. 
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Figure  IX. L6.  Cochran-Armi tage  test  of  homogeneity  across  groups  -  I.eBlanc  test  A  -  21  day 

mortality  -  groups  6  and  7  omitted  -  water  control  group.  Scores  are  group  indices. 
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Cochran-Armitage  test  of  homogeneity  across  groups  -  LeBlant  test 
mortality  -  combined  control  groups.  Scores  are  group  indices. 
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i-Armitag°  test  of  homogeneity  across  groups  -  heBla 
ty  -  group  7  omitted  -  combined  control  groups.  Sc 


Figure  IX. 19.  Cochran-Armi tage  test  of  homogeneity  across  groups  -  I.eBlanc  test  B  -  21  day 
mortality  -  combined  control  groups.  Scores  are  log  (concentration). 


Figure  LX. 20.  Cochran-Armitage  test  of  homogeneity  across  groups  -  L.eBlanc  test  B  21  day 

mortality  -  group  7  omitted  -  combined  control  groups.  Scores  are  log  (concentration) 


Figure  IX. 21.  Cochran-Armitage  test  of  homogeneity  across  groups  -  Goulden  isophorone  test 
21  day  mortality.  Scores  are  group  indices. 


Figure  IX. 22.  Cochran-Armi tage  test  of  homogeneity  across  groups  -  Coulden  isophorone  test 
group  6  omitted  -  21  day  mortality.  Scores  are  group  indices. 


Figure  IX. 23.  Cochran-Armitage  test  of  homogeneity  across  groups  -  Goulden  isophorone  test 
groups  5  and  6  omitted  -  21  day  mortality.  Scores  are  group  indices. 


gure  IX. 24.  Cochran-Armi tage  test  of  homogeneity  across  groups  -  Goulden 
21  day  mortality.  Scores  are  log  (1  +  concentration). 
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Cochran-Armi tage  test  of  homogeneity  across  groups  -  Goulden  isophorone  test 
group  6  omitted  -  21  day  mortality  data.  Scores  are  log  (1  +  concentration) 
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.26.  Cochran-Armi tage  test  of  homogeneity  across  groups  -  Goulden  isophorone  test  - 

groups  5  and  6  omitted  -  21  day  mortality  data.  Scores  are  log  (1  +  concentration) 


X.  DOSE  RESPONSE  CURVE  ESTIMATION-PROBIT  ANALYSIS 
OF  MORTALITY  RATES 


A.  INTRODUCTION.  DOSE  RESPONSE  CURVE  ESTIMATION  VS  HYPOTHESIS  TESTING. 
DESCRIPTIONS  OF  MODELS 


An  alternative  approach  to  hypothesis  testing  and  multiple  comparisons 
for  the  determination  of  acceptable  concentrations  with  respect  to  mortali¬ 
ty  is  through  fitting  dose  response  models  to  the  mortality  data  and 
estimating  the  concentrations,  Cl,  which  result  in  incremental  mortality  of 
at  most  L  over  and  above  the  background  level.  The  problem  of  determining 
a  safe  concentration  has  been  transformed  from  a  hypothesis  testing  problem 
(determine  which  treatment  group  mortality  rates  are  statistically 
significantly  different  from  the  control  rate)  to  an  estimation  problem 
(obtain  point  estimates  of  and  confidence  intervals  on  C^). 

The  hypothesis  testing  and  dose  response  curve  estimation  problems  are 
conceptually  different  and  have  different  implications.  A  number  of  these 
differences  were  discussed  by  Feder  and  Collins  [11,  Section  XIV. 

Hypothesis  testing  procedures  provide  information  as  to  whether  treatment 
group  responses  are  statistically  significantly  different  than  control 
group  responses.  They  say  nothing  about  biologically  significant 
differences.  Namely  the  differences  between  treatment  group  and  control 
group  mortality  rates  may  be  highly  statistically  significant  but  yet 
biologically  trivial,  if  the  sample  sizes  are  great  enough.  Conversely, 
biologically  important  differences  may  not  be  supported  as  being 
statistically  significant  if  the  sample  sizes  are  too  small.  By  contrast, 
dose  response  curve  estimation  procedures  estimate  those  concentration 
levels  that  produce  biologically  significant  changes  in  response. 

Biological  significance  is  quantified  by  stating  the  increments  from 
control  rates  that  are  considered  to  be  important — for  example  10  percent, 
25  percent,  etc.,  increases  in  mortality.  Point  and  confidence  interval 
inferences  about  concentrations  associated  with  such  differences  are  then 
constructed. 

An  additional  conceptual  difference  between  the  two  types  of  procedures 
is  reflected  in  the  effects  on  inferences  of  changes  in  sample  sizes.  With 
the  classical  hypothesis  testing  formulation  the  larger  and  more  precise 
the  experiment  the  more  powerful  will  be  the  hypothesis  test.  Thus  lower 
concentration  levels  will  yield  responses  that  are  statistically  signifi¬ 
cantly  different  then  the  control  group  response.  The  MATC  will  be 
decreased.  Conversely  the  smaller  the  experiment  is  and  the  more  variable 
the  responses  are,  the  greater  will  be  the  MATC.  The  effects  of  increased 
sample  sizes  and  precision  on  inferences  based  on  dose  response  curves  are 
exactly  the  opposite.  The  larger  and  more  precise  the  experiment,  the 
tighter  will  be  the  confidence  bounds  on  C^,  (the  concentration  associated 
with  an  incremental  response  of  L).  Thus  the  lower  confidence  bound  on  Cl 
will  be  increased. 
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We  feel  that  the  latter  situation  is  as  it  should  be.  The  more 
extensive  and  the  more  precise  the  supporting  evidence,  the  more  liberal 
should  be  the  determinations  of  safe  concentration.  Thus  inferences  about 
acceptable  concentrations  based  on  dose  response  curves  depend  on  sample 
size  in  the  manner  they  should  whereas  inferences  based  on  hypothesis  tests 
depend  on  sample  size  in  a  manner  opposite  to  what  they  should. 

Feder  and  Collins  [1]  discuss  these  conceptual  differences  at  greater 
length.  They  illustrate  a  number  of  different  dose  response  models  and 
computer  programs  for  fitting  such  models.  In  particular  they  illustrate 
probit  models  to  describe  trends  in  mortality  rates  with  increases  in 
either  concentration  or  log  (concentration).  See  Sections  XIV,  XV,  XVI  of 
Feder  and  Collins  for  details.  Some  of  the  topics  discussed  in  those 
sections  include: 

•  Fitting  probit  models  in  concentration  or  log  concentration  using 
the  special  purpose  probit  analysis  module,  PROC  PROBIT,  in  the  SAS 
statistical  computing  system  [10]. 

•  Fitting  probit  and  logit  models  in  concentration  or  log  concentra¬ 
tion  using  the  general  purpose  SAS  nonlinear  regression  model,  PROC 
NLIN. 

•  Fitting  adjustments  for  background  mortality  using  Abbott's 
correction  and  alternatives. 

•  Fitting  nonparametric  dose  response  models  that  yield  conservative 
lower  confidence  bounds  on  safe  concentrations  without  assuming 
specific  parametric  forms  for  the  response  curve. 

In  this  section  we  discuss  fitting  three  parameter  probit  models  to  the 
21-day  mortality  responses,  using  either  concentration  or  log  concentration 
as  the  independent  variable,  as  appropriate.  Background  mortality  is 
accounted  for  by  Abbott's  correction.  See  Finney  [41  for  a  detailed 
description  of  this  model  and  associated  inferences. 

Feder  and  Collins  [1]  discuss  the  tradeoffs  involved  in  not  adjusting 
for  background  mortality.  That  is,  a  two  parameter  probit  model, 
incorporating  the  assumption  of  zero  background  mortality,  could  be  fitted 
to  the  data.  If  background  mortality  is  in  fact  present,  then  the 
estimated  mortality  rates  based  on  these  two  parameter  models  would  be 
biased  downwards  toward  zero,  especially  at  the  low  mortality  end  of  the 
curve.  However  the  standard  errors  of  the  estimates  there  would  be 
reduced.  Therefore  if  the  background  mortality  is  in  fact  nonzero  but  is 
not  too  far  different  from  zero,  the  resulting  increased  precision  might 
more  than  offset  the  downward  biases,  thus  resulting  in  more  accurate 
estimates. 

This  suggests  that  a  special  study  should  be  made  to  determine  how 
large  the  estimates  of  background  mortality  would  need  to  be  before 
background  adjustments  would  be  called  for.  The  greater  the  quantity  and 
the  more  precise  the  data,  the  less  background  variation  that  could  be 


tolerated  before  background  adjustments  were  needed.  In  the  absence  of  the 
results  of  such  a  detailed  study,  we  have  adopted  here  the  somewhat 
arbitrary  rule  that  if  the  estimated  background  mortality  rate  parameter  in 
a  three  parameter  probit  fit  is  significantly  different  from  zero  at  the  5 
percent  level,  then  we  will  retain  it  in  the  model,  thereby  adjusting  for 
background. 

The  background  mortality  rates  in  the  various  data  sets  we  have  looked 
at  are  generally  fairly  high,  often  in  excess  of  10  percent.  In  the 
combined  vehicle  and  water  control  groups  in  LeBlanc's  tests  A  and  B,  15  of 
160  and  21  of  160  daphnids  died,  respectively.  In  Adams'  test,  1  of  15 
control  daphnids  died.  In  Chapman's  test,  2  of  20  control  daphnids  died. 

In  Goulden's  test,  2  of  15  of  the  multiple  housed  control  daphnids  died. 

We  fitted  three  parameter  probit  models  to  the  21-day  mortality  data 
from  LeBlanc's  tests  A  and  B  and  from  Goulden's  isophorone  test.  In  each 
case  the  background  rate  was  significant  at  the  5  percent  level;  we  thus 
retained  these  corrections  for  background. 

All  probit  fits  were  carried  out  using  the  general  purpose  BMDP  non¬ 
linear  regression  program,  BMDPAR.  See  Dixon  and  Brown  [7]  for  a  detailed 
description  of  the  computer  program.  See  Jennrich  and  Moore  [25]  for  a 
discussion  of  the  theory  underlying  the  use  of  nonlinear  regression  methods 
to  perform  maximum  likelihood  probit  analysis  fits.  Although  using 
nonlinear  regression  programs  to  fit  probit  dose  response  models  is  a  bit 
more  fussy  than  using  special  purpose  probit  analysis  programs,  there  are  a 
number  of  advantages  to  this  approach.  These  include: 

•  Program  availability.  A  user  may  not  have  a  special  purpose 
computer  program  accessible  that  will  fit  three  parameter  probit 
models.  In  particular,  SAS  PROC  PROBIT  is  available  only  to  SAS 
users.  By  contrast,  any  nonlinear  regression  program  with  the 
capability  of  calculating  weighted  least  squares  fits  with 
iteratively  recalculated  weights  can  be  used  to  fit  probit  models  in 
the  manner  discussed  in  this  section. 

•  Flexibility  of  models.  Various  functional  forms  such  as  probit, 
logit,  or  generalizations  of  these  such  as  discussed  by  Prentice 
[26]  can  be  specified.  Transformations  of  concentration  such  as 
logarithm,  square  root,  etc.,  can  be  specified.  Alternatives  to 
Abbott's  correction  for  background  can  be  specified  (see  e.g.,  Feder 
and  Collins  [1],  Section  XV),  such  as  specifying  background  as  an 
effective  addition  to  the  toxicant  concentration.  Centering  and 
scale  constants  can  be  included  in  the  models  to  reduce  correlations 
among  model  terms  and  to  improve  the  numerical  convergence 
properties.  Dose  response  curves  resulting  from  different 
experiments  can  be  compared  with  one  another. 

•  Enhanced  residual  analysis  capability.  Predicted  and  residual 
values  can  be  saved  and  further  studied  with  subsequent  statistical 
analyses  and  data  displays. 


We  have  fitted  standard  three  parameter  probit  models  with  either 
logarithmic  (common  logs  were  used)  or  untransformed  concentration  levels. 
These  models  can  be  expressed  as 

p(conc)=po+(  1-po)‘f  (6o+Bl  (z-m) ) 

where  po,p(conc)  are  the  response  rates  at  concentrations  zero  and  "cone" 
respectively,  z  is  either  "cone"  or  logjf)(conc ) ,  m  is  a  fixed  centering 
constant  to  reduce  the  correlation  among  model  terms  and  thereby  improve 
convergence,  $(•)  is  the  normal  c.d.f.,  and  Po>3o>Bl  are  unknown  parameters 
to  be  estimated  from  the  model  fit  to  the  data. 

An  alternative  adjustment  for  background  mortality  would  be  the  model 

p(conc)=4>(ao+a-|  logio(conc+c) ) 

where  p(conc)  is  the  response  rate  at  once,  c  is  the  effective  additive 
background  concentration,  and  OQ,ai,c  are  unknown  constants  to  be  estimated 
from  the  model  fit.  We  decided  not  to  use  this  alternative  functional  form 
because  the  shapes  of  the  concentration-response  relations  displayed  in 
Figures  II.1-II.10  do  not  reflect  this  behavior.  The  wide  range  of 
concentrations  at  the  low  end  of  the  mortality  curve  with  nonzero  but 
essentially  constant  mortality  contradict  the  relationship  that  an  additive 
background  concentration  would  predict.  We  thus  confined  attention  to 
Abbott's  correction  for  background;  however  the  alternative  model  would  be 
no  more  difficult  technically  to  specify  and  fit  then  the  standard  probit 
model. 

Beaker-to-beaker  heterogeneity  within  groups  was  accounted  and  adjusted 
for  by  reducing  the  actual  sample  sizes  and  numbers  of  responses  to 
effective  values,  as  discussed  in  Subsection  VB,  and  carrying  out  subsequent 
analyses  using  these  adjusted  values.  If  desired,  the  t  and  F  distributions 
with  degrees  of  freedom  based  on  those  given  in  Tables  V.3,  V.*4  might  be 
substituted  for  the  normal  and  chi  square  distributions  when  making 
inferences  from  the  probit  fits.  However  for  LeBlanc's  data  sets  these 
degrees  of  freedom  are  sufficiently  large  that  the  substitution  would 
produce  no  differences  of  practical  importance. 


B.  MAXIMUM  LIKELIHOOD  PROBIT  ANALYSIS  BY  NONLINEAR  LEAST  SQUARES 
REGRESSION 


Jennrich  and  Moore  [25]  show  that  for  distributions  in  the  exponential 
family,  maximum  likelihood  estimation  can  be  carried  out  by  means  of 
nonlinear  least  squares  regression.  This  applies,  in  particular,  to  models 
based  on  the  binomial  distribution.  Both  BMDP  [7]  (P3R  and  PAR)  and  SAS 
[101  (PROC  NLIN)  contain  nonlinear  regression  modules  that  can  be  used  to 
fit  various  dose  response  curve  models.  Any  nonlinear  regression  program 
with  an  iteratively  reweighted  least  squares  capability  would  suffice. 


We  fit  the  three  parameter  probit  model  discussed  in  the  introduction, 
namely 

p(conc)=p0+(  1-po)<t’(Bo+e1  (z-™) ) 

where  z  is  cone  or  logio(conc)  and  m  is  a  fixed  centering  constant.  The 
theory  underlying  the  fit  is  discus  3ed  in  general  by  Jennrich  and  Moore 
[25]  and  specifically  for  the  probit  model  by  Feder  and  Collins  [1], 
Section  XV.  We  discuss  the  details  of  the  model  fits  and  the  resulting 
estimates  below.  We  consider  fits  to  the  21-day  mortality  responses  from 
LeBlanc's  Tests  A  and  B  and  Goulden's  Isophorone  test.  Under  the 
assumptions  of  the  nonlinear  regression  model,  the  dependent  variable,  X^, 
has  mean  y ^ ( g )  and  variance  o2(Q),  where  0e(po,Bo> Si ) >  Ni  is  the  effective 
sample  size  in  the  i-th  group,  and 

Ui(6)=Pi(9 )=p(conci) 

oi(0)spi(e)(i-pi(e))/Ni 

We  now  consider  the  details  of  the  fits  to  each  of  the  data  sets  in  turn. 


LeBlanc  Test  A  -  Solvent  Control  Group  -  Logarithmic  Concentration 


We  first  discuss  the  details  of  the  model  fitting  procedure  using 
BMDPAR.  (We  used  this  program  rather  than  BMDP3R  because  with  BMDPAR  we  do 
not  need  to  specify  the  functional  forms  of  the  derivatives  of  p(6)).  The 
BMDPAR  program  commands  needed  to  generate  the  fit  are  given  below.  See 
the  BMDP  manual  [7] ,  pp  484-514  for  further  details.  Various  lines  in  the 
program  command  file  are  numbered.  These  numbered  lines  are  explained 
further  below. 

Line  1  instructs  the  computer  system  to  list  the  basic  data  prior  to 
analysis. 

Lines  2  attach  the  appropriate  BMDP  program  and  instruct  it  as  to 
where  the  data  are  to  be  found.  Note  that  the  systems  instructions  above 
this  point  pertain  only  to  CDC  systems  and  undoubtedly  differ  at  other 
installations. 

Lines  3  are  the  basic  input  data.  The  four  variables  represent  group 
number  (one  control  and  five  treatment  groups),  concentration,  effective 
sample  size,  and  effective  number  of  responses. 

Lines  4  are  a  FORTRAN  subroutine  that  specify  the  form  of  the  dose 
response  function  and  the  form  of  the  caseweights  for  the  least  squares 
fits.  In  this  example,  F  is  the  probit  dose  response  function,  PHI  is  the 
standard  normal  c.d.f.,  and  X(6)  is  the  caseweight  which  is  1/o^(0). 

Lines  5  are  standard  BMDP  control  language  commands  which  specify  the 
source,  form,  identifiers,  and  desired  transformations  of  the  data.  In 
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this  example  the  centering  constant,  m,  is  denoted  as  ZBAR  and  is  set  equal 
to  -2.113. 

Lines  6  and  7  specify  the  dependent  variable  ,  P,  and  the  weight 
variable,  CASEWT.  The  observed  P  corresponds  to  the  predicted  F  in  the 
subroutine.  The  variable  CASEWT  is  identified  with  X ( 6 )  in  the  subroutine. 
X(6)  and  F  are  evaluated  for  each  case,  for  every  iteration  in  the  fitting 
procedure. 

Line  8  is  a  technical  fine  point  which  instructs  the  fitting  algorithm 
to  find  a  stationary  point  rather  to  minimize  the  residual  sum  of  squares. 
These  two  objectives  differ  because  of  the  weights  that  depend  on  the 
unknown  parameters.  The  distinction  corresponds  to  the  theoretical  dis¬ 
tinction  between  a  maximum  likelihood  estimate  and  a  minimum  chi  square 
estimate.  While  of  theoretical  interest,  this  detail  is  not  of  too  much 
practical  importance  and  so  will  not  be  pursued  further.  See  the  BMDP 
manual  [7]  or  Jennrich  and  Moore  [25]  for  further  discussion. 

Line  9  tells  the  program  that  the  variances  as  specified  in  the  model 
should  be  used  for  the  calculation  of .the  standard  errors.  More  precisely, 
we  specified  caseweights  equal  to  l/o^Ce),  or  equivalently  variances  of  the 
dependent  variable  equal  to  o?(0).  However  the  least  squares  algorithm 
usually  assumes  that  the  variances  are  koj(0)  where  k  is  estimated  from  the 
residual  mean  square.  The  command  in  line  9  instructs  the  program  to  set  k 
equal  to  1 . 

Line  10  specifies  the  initial  parameter  values  with  which  to  start  the 
iteration.  These  values  are  often  estimated  from  a  preliminary  analysis  or 
graphical  display  of  the  data.  In  this  case  they  were  based  on  the  last 
set  of  parameter  values  from  a  previous  run  (not  shown)  which  did  not 
converge.  The  previous  run  was  started  with  initial  values  based  on 
graphing  the  data. 

Line  11  specifies  that  observed,  predicted,  and  residual  values  are  to 
be  plotted  vs  concentration  and  log  concentration. 

The  output  from  these  commands  appears  in  Figures  X. 1  to  X.4.  Figure 
X.1,  top,  contains  summary  information  about  the  problem  specifications  and 
input  variables.  The  second  portion  of  the  figure  contains  the  summary 
results  of  the  iterative  Gauss-Newton  procedure.  For  each  iteration  the 
(weighted)  residual  sum  of  squares  and  updated  parameter  values  are  shown. 
The  algorithm  converges  to  a  stationary  point  after  20  iterations.  At  this 
point,  incremental  changes  in  the  residual  sum  of  squares  function  are 
essentially  zero.  Thus  this  point  corresponds  to  the  maximum  likelihood 
estimates.  Note  that  this  point  does  not  minimize  the  residual  sum  of 
squares  function,  which  was  smaller  at  the  starting  values.  This  fine 
point  was  discussed  in  the  introduction.  The  bottom  portion  of  the  figure 
contains  statistics  based  on  the  model  converged  to.  Parameter  estimates 
are  given  for  the  three  parameters  in  the  model.  Namely,  pq=0.083,  Ro= 
-2.521 ,  ?i=8. 108.  The  fitted  model  is 


p(conc)=pg+( 1-Po )$ (eo+  fr|  (logig(conc)+2. 113)) 

The  estimated  asymptotic  standard  errors  of  the  parameter  estimates  are 
given  beneath  the  estimates.  These  values  are  used  for  inferences. 

Figure  X.2  contains,  for  each  case,  the  observed,  predicted  and 
residual  values  based  on  the  fit.  Also  given  are  the  estimated  standard 
errors  of  the  predictions  and  the  values  of  the  various  input  variables  for 
that  case.  The  plots  in  Figures  X. 3  and  X.4  show  good  agreement  between 
observed  values  and  predictions  based  on  the  model.  The  greatest 
discrepancies  occur  at  the  control  group  and  at  the  lowest  concentration 
group,  where  the  observed  responses  are  most  ragged.  However  even  here, 
the  discrepancies  are  not  large. 

The  residual  sum  of  squares  is  3.01107  with  3  degrees  of  freedom.  This 
value  represents  the  chi  square  statistic  for  goodness  of  fit  of  the  model. 
If  the  model  fits  the  data  then  this  statistic  should  have  a  chi  square 
distribution  with  3  d.f.  Alternatively,  using  the  residual  degrees  of 
freedom,  153,  calculated  from  Table  V.3  we  might  compare  the  residual  mean 
square  to  the  percentiles  of  the  F-distribution  with  3  and  153  degrees  of 
freedom.  The  upper  90  percent  point  of  the  chi  square  distribution  with  3 
degrees  of  freedom  is  6.25.  There  is  thus  no  evidence  of  lack  of  fit. 

An  important  purpose  of  fitting  the  probit  model  is  to  calculate  point 
and  confidence  interval  estimates  of  acceptable  concentrations.  Feder  and 
Collins  [1],  Section  XV,  present  a  method  of  calculating  approximate 
confidence  intervals  by  means  of  Fieller’s  Theorem  (Finney  [41,  pp  78-79). 
We  present  below  an  alternative  method  of  calculating  approximate 
confidence  intervals  by  means  of  the  delta  method  (Cramer  [27] ,  pp  366— 
367).  We  wish  to  calculate  a  point  estimate  and  confidence  interval  on  the 
concentration,  Cl,  such  that  $(80+31 (loSlO^L+^* 1 13) )=L,  where  L  is  some 
specified  incremental  response  rate  over  and  above  the  control  rate  (e.g., 
L=0.05,  0.10,  0.25,  etc.).  L  represents  the  response  rate  attributed  to 
toxicant  (over  and  above  background  level). 

Let  ZL=logiQCL+2. 1 1 3.  We  first  calculate  point  and  confidence  interval 
estimates  of  zl,  and  then  translate  them  into  corresponding  estimates  of 
Cl-  The  point  estimate,  zl>  is 

%l=(  $-1  (L)-6o)/8-j  =  (fL"^o)/^l. 

Let  |  denote  the  estimated  asymptotic  variance-covariance  matrix  of 
(P.0,6i).  Then 

Var(zL)  =  (-1/6i,(6o-fL)/6l2)  t(-1/81,(B0-fL)/ef)' 

An  approximate  1-a  confidence  interval  interval  on  zl  is 
^L±Ca/2[^ar(zL)]1/2=(£’u) 

where  Ca/2  is  the  upper  a/2  percentile  of  the  standard  normal  distribution. 
The  theoretical  justification  of  these  expressions  is  given  in  Appendix  AX. 


The  values  of  are  obtained  from  the  BMDPAR  output,  in  particular 

from  Figure  X.l.  The  corresponding  estimate  and  confidence  interval  for 
is 

CL=1  O^L-2, 1 1 3  and  CL  e  ( 1 1 3 , iou“2* 1  ’3) 

For  the  LeBlanc  Test  A  solvent  control  data,  Bq=-2. 52089, 8-|=8. 27337 

/l. 20098  0  \/  1.0000  -0.9546\ /1.20098  0  \  /  1.44235  -2.8554\ 

\  0  2. 49065/ yo. 9546  1.0000/\  0  2.49065 /  ^2.8554  6.2033/ 

The  results  of  the  calculations  are  given  below. 

LeBlanc  Test  A  -  Solvent  Control  Group-Point  Estimates  and 
95  Percent  Confidence  Intervals  on  Various  Percentiles  of 
the  Probit  Fit — by  Delta  Method 
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These  results  show  that  the  dose  response  curve  rises  very  rapidly  between 
the  estimated  5th  and  50th  percentiles.  In  fact  the  upper  confidence  bound 
on  C#05  exceeds  the  estimate  of  C50.  Thus  dose  response  curve  perecentiles 
with  very  different  biological  implications  cannot  be  well  separated  based 
on  the  results  of  this  test. 

We  carried  out  similar  analyses  using  the  water  control  group  rather 
than  the  solvent  control  group.  The  three  parameter  probit  model  fitted  to 
logarithmic  concentration  again  fitted  the  data  well.  The  details  of  the 
fit  are  not  shown,  however  the  estimated  background  mortality  rate  is  0.117 
(with  a  standard  error  of  0.021)  as  compared  to  0.0831  (with  a  standard 
error  of  0.018)  based  on  the  solvent  control  group.  These  adjusted 
background  mortality  rates  are  not  significantly  different  (statistically  or 
biologically) . 

The  results  of  the  percentile  estimates  and  confidence  interval  calcula¬ 
tions  are  given  below. 
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LeBlanc  Test  A  -  Water  Control  Group-Point  Estimates 
and  95  Percent  Confidence  Intervals  on  Various 
Percentiles  of  Probit  Fit 
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The  results  are  the  same,  for  practical  considerations  as  those  based  on  the 
solvent  control  group.  Thus  even  though  the  unadjusted  mortality  rates  in 
the  water  and  solvent  control  groups  are  (statistically)  significantly 
different,  the  estimates  of  dose  response  curve  percentiles  are  virtually 
unaffected. 


LeBlanc  Test  B — Combined  Control  Groups 

We  now  consider  the  results  of  fitting  the  three  parameter  probit  model 
to  the  21-day  mortality  results  from  LeBlanc's  Test  B.  The  results  in  the 
water  and  solvent  control  groups  were  combined  and  used  for  comparison 
purposes.  The  measured  concentration  levels  in  these  two  groups  were 
averaged.  Effective  sample  sizes  and  numbers  of  responses,  as  shown  in 
Table  V.4,  were  used  in  the  analysis. 

We  first  attempted  to  fit  a  probit  model  using  logarithmic  concentra¬ 
tion.  The  results  are  shown  in  Figures  X.5-X.7.  The  algorithm  would  not 
converge  when  the  increment  halving  option  was  removed.  Thus  the  values 
converged  to  are  not  strictly  maximum  likelihood  estimates.  However  the  sum 
of  squares  is  nearly  at  a  stationary  point  and  the  parameter  estimates  have 
settled  down.  The  plot  in  Figure  X.7  shows  about  as  good  a  fit  as  can  be 
expected  to  such  nonmonotone  responses.  The  residual  sum  of  squares,  5.7839 
with  3  degrees  of  freedom,  is  large  but  not  statistically  significant.  (The 
upper  90  percent  point  of  the  chi  square  distribution  with  3  degrees  of 
freedom  is  6.251.)  Thus  we  will  use  these  estimates.  The  parameter 
estimates  in  this  model  are  so  highly  intercorrelated  that  standard  error 
and  correlation  estimates  cannot  be  properly  calculated.  This  is  the  reason 
for  the  message  at  the  top  of  Figure  X.6.  The  reason  for  this  intercorre¬ 
lation  is  clearly  seen  in  Figure  X.7.  The  response  curve  is  flat  throughout 


most  of  the  range  of  concentrations.  Only  the  highest  treatment  group  has  a 
mortality  rate  substantially  in  excess  of  the  control  rate,  and  the 
effective  sample  size  in  that  group  is  just  8.96.  Thus  there  is  very  poor 
information  about  the  slope  of  the  response  curve. 

We  next  refitted  the  probit  model  using  the  same  data  but  with  untrans¬ 
formed  concentration.  In  particular,  the  model  fitted  was 

p(conc)=po+(  1-Po)<t(So+8i  (conc-0.0250) ) 

The  results  are  shown  in  Figures  X.8-X.10.  As  in  the  first  attempt,  the 
algorithm  would  not  converge  when  the  increment  halving  option  was  removed. 
This  option  was  therefore  retained  and  so  the  values  converged  to  are  again 
not  strictly  maximum  likelihood  estimates.  However,  as  the  iteration  pro¬ 
cess  has  pretty  much  settled  down  and  since  the  fit  appears  to  be  good  (see 
Figure  X.10),  we  will  use  these  estimates.  The  residual  sum  of  squares, 
5.8748  with  3  degrees  of  freedom,  is  again  large  but  not  statistically 
significant.  The  intercorrelation  among  the  estimates  has  been  reduced  to 
the  extent  that  estimated  correlations  and  standard  errors  can  at  least  be 
calculated.  Note  that  the  estimated  slope  is  15.9695  with  a  standard  error 
of  12.9349.  Thus  8^  is  not  significantly  different  from  0.  This  high 
standard  error  may  be  due  to  the  very  high  intercorrelation  between  the 
slope  and  intercept  estimates.  This  in  turn  is  due  to  the  very  limited 
information  about  the  slope.  The  mortality  rate  is  essentially  constant 
throughout  most  of  the  range  of  concentrations. 

The  parameters  from  the  model  fit  wf  re  used  to  calculate  point  estimates 
of  and  confidence  intervals  on  the  dose  response  curve  percentiles.  The 
results  of  these  calculations  are  given  below. 

LeBlanc  Test  B  -  Combined  Control  Groups-Point  Estimates 
and  95  Percent  Confidence  Intervals  on  Various 
Percentiles  of  Probit  Fit 
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The  confidence  intervals,  particularly  those  corresponding  to  the  lower 
percentiles,  are  so  wide  as  to  be  useless.  This  is  undoubtedly  due  to  the 
high  degree  of  uncertainty  in  the  parameter  estimates,  as  indicated  by  their 
very  large  standard  errors.  We  thus  conclude  that  we  cannot  make  very 
strong  inferences,  based  on  the  results  of  this  test,  about  the 
concentrations  corresponding  to  various  levels  of  mortality. 

The  final  example  pertains  to  the  21-day  mortality  data  from  Goulden's 
test  on  isophorone. 


Goulden  Isophorone  Test — Logarithmic  Concentration 


As  there  was  just  one  control  group  in  this  test,  there  is  no  question 
about  which  control  group  or  combination  of  control  groups  to  use  for 
comparison  purposes.  The  dose  response  curve  is  estimated  from  the  results 
only  on  the  multiple  housed  daphnids.  As  there  is  no  evidence  of  beaker-to- 
beaker  heterogeneity  within  groups,  there  is  no  need  to  adjust  the  sample 
sizes  and  numbers  of  responses.  The  model  fitted  was 

p(conc)=  Po+(  1-PoH(  Bo+Sl  (log-|0(conc)-1 .8352) ) 

Let  zl  denote  log-]  qCl-1  *8352. 

The  results  from  the  probit  fit  are  shown  in  Figures  X.11-X.13.  The 
Gauss-Newton  algorithm  converged  to  the  maximum  likelihood  estimates  shown 
in  iteration  15  in  Figure  XI. 11.  The  residual  sum  of  squares,  0.6486  with 
3  degrees  of  freedom,  is  very  small.  The  fit  looks  quite  good  in  Figure 
X.  1 3. 

The  parameters  from  the  model  fit  were  used  to  calculate  point  esti¬ 
mates  of  and  confidence  intervals  on  the  dose  response  curve  percentiles. 

The  results  of  the  calculations  are  given  below. 

Goulden  Isophorone  Test-Point  Estimates  and  95  Percent 
Confidence  Intervals  on  Various  Percentiles  of  Probit  Fit 
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Figure  X.2.  Observed,  predicted  and  residual  values  from  probit  fit  to  LeBlanc  test 
21  day  mortality  data — solvent  control  group — logarthmic  concentration 
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Figure  X. 3.  Observed  and  predicted  values  from  fit  in  Figure  X. 1  versus 


Figure  X.4.  Residuals  from  fit  in  Figure  X.l  versus  logarithmic  concentration. 
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Figure  X.5.  Output  from  BMDPAR  applied  to  l.eBlanc  test  B  21  day  mortality  data  combined 
control  group — logarithmic  concentration. 
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Figure  X.8.  Output  from  BMDPAR  applied  to  LeBlane  test  B  ?1  day  mortality  data — combined 
control  groups — unt ransformed  concentration. 


Figure  X.8.  Continued. 
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Figure  X.9.  Observed,  predicted  and  residual  values  from  probit  fit  to  LeBlanc  test  B 

21  day  mortality  data — combined  control  groups — untransformed  concentration. 
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Figure  X.IO.  Observed  and  predicted  values  from  fit  in 
Figure  X.8  versus  concentration. 
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Figure  X.ll.  Continued 


PROBIT  FIT  TO  LOG10  CONC— GOUIOEN  ISOPHORONE  TEST 
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Figure  X.12.  Observed,  predicted  and  residual  values  from  probit  fit  to  Goulden 
isophorone  test  21  day  mortality  data  logarithmic  concentration. 


PLOT  OF  LCONC  VERSUS  PREDICTED  AND  OBSERVED 


LCONC 

Observed  and  predicted  values  from  fit  in  Figure  X.ll  versus 
logarithmic  concentration  (logjQ  (1  +  cone)). 


XI.  TESTING  FOR  CONCENTRATION  RELATED  EFFECTS 
ON  REPRODUCTION  AND  LENGTH 


A.  INTRODUCTION 


In  Sections  IX  and  X  we  considered  the  comparison  of  mortality  rates 
across  groups.  Section  IX  was  concerned  with  hypothesis  testing  and 
multiple  comparison  procedures  while  Section  X  was  concerned  with  fitting 
dose  response  curves  and  estimating  safe  concentrations  based  on  the 
percentiles  of  these  curves.  A  directly  analogous  situation  holds  for 
quantitative  responses  such  as  length  and  weight.  In  this  section  we 
consider  hypothesis  testing  and  multiple  comparison  procedures  and  in  the 
next  section  we  consider  dose  response  estimation  and  associated  inferences 
based  on  multiple  regression  models. 

While  the  procedures  employed  to  analyze  the  length  and  reproduction 
responses  are  for  the  most  part  similar  to  those  used  to  analyze  mortality, 
there  are  a  number  of  conceptual  and  technical  differences  in  the  problems. 
Several  of  these  are  discussed  below. 

An  important  issue  is  related  to  mortalities  censoring  the  nonlethal 
responses.  That  is,  measurements  such  length,  reproduction,  weight,  brood 
size,  etc.  obviously  can  only  be  determined  on  survivors.  Thus  the  daph- 
nids  upon  which  the  determinations  of  nonlethal  doses  are  based  are  not 
chosen  at  random,  but  rather  are  the  hardiest  in  the  groups,  since  they 
survived.  This  can  potentially  give  rise  to  biased  comparisons  among 
groups  since  at  high  concentrations  the  weaker  daphnids  would  be  killed  off 
while  at  low  concentrations  these  weaker  daphnids  would  survive  and 
register  inferior  lengths  or  reproduction.  This  could  mask  dose  response 
effects  or  even  show  reverse  effects  (e.g.,  greater  average  lengths  among 
the  survivors  at  the  high  concentrations  than  at  the  low). 

There  are  no  completely  satisfactory  ways  to  eliminate  such  potential 
biases.  One  partial  solution  is  to  regard  the  responses  in  a  heirarchy. 
Mortality  would  be  a  first  order  effect.  Any  concentration  which  results 
in  "substantial"  mortality  would  be  considered  unsafe,  irrespective  of  any 
other  responses.  Sublethal  effects  such  as  reductions  in  length  or 
reproduction  would  be  considered  second  order.  Among  concentrations  which 
pass  the  mortality  screen,  any  that  result  in  biologically  and  statis¬ 
tically  significant  nonlethal  effects  would  be  considered  unsafe.  Thus 
before  testing  hypotheses  about,  or  fitting  dose  response  curves  to  length 
or  reproduction  responses,  we  delete  those  groups  with  "substantial" 
mortality.  We  have  not  precisely  defined  "substantial"  for  this  purpose. 

An  operational  definition  such  as  20,  30,  50,  etc.,  percent  increases  in 
mortality  above  background  might  be  used.  Gelber,  et  al  [281 ,  suggest 
deleting  all  groups  at  concentrations  beyond  the  MATC  "...to  achieve 
comparable  numbers  of  survivors  per  tank...".  Rather  then  adopting  such  a 
formal  approach,  we  made  individual  decisions  on  a  test-by-test  basis.  In 
particular,  we  deleted  the  length  and  reproduction  results  from  Group  7  in 
LeBlanc's  Test  A,  from  Groups  5,  6,  7,  8  in  Adams'  Selenium  test,  and  from 


Group  6  in  Goulden's  Isophorone  test.  The  consequences  on  interpretation  of 
results  of  deleting  these  high  mortality  groups  from  comparisons  of  nonlethal 
effects  requires  further  discussion  and  study  among  toxicologists. 

Another  difference  between  the  inference  problems  associated  with 
mortality  and  those  associated  with  nonlethal  responses  is  related  to  the 
monotoniqity  or  lack  of  monotonicity  of  the  response  levels  with  increasing 
concentration.  The  inference  procedures  for  mortality  are  based  on  the 
assumption  that  mortality  rate  increases  (or  at  least  does  not  decrease) 
with  increasing  concentration.  The  measure  of  association  test,  the 
Cochran-Armitage  Test,  Williams'  Test,  and  the  probit  model  all  require 
monotonicity.  Such  monotonicity  does  not  necessarily  hold  for  length  and 
reproduction.  Figures  II. 11-11. 14  and  II. 21-11. 24  show  that  reproduction 
levels  in  LeBlanc's  Tests  A  and  B  and  in  Goulden's  Isophorone  test  first 
increase  and  then  decrease  as  concentration  is  increased.  Figure  11.29 
shows  that  average  lengths  in  LeBlanc's  Test  B  first  increase  and  then 
decrease  as  concentration  is  increased.  Thus  the  inference  procedures  used 
must  be  valid  for  nonmonotone  trends. 

Inference  procedures  for  mortality  are  generally  based  on  the  binomial 
distribution.  Such  procedures  tacitly  assume  that  the  variances  of 
responses  are  certain  specified  functions  of  the  means.  Thus  variance 
estimates  need  not  be  supplied.  However  for  quantitative  responses  such  as 
length  and  reproduction,  the  comparable  procedures  are  based  on  regression 
analysis  and  analysis  of  variance  which  do  require  estimates  of 
variability.  The  question  thus  arises  as  to  how  these  variability 
estimates  will  be  calculated.  They  should  simultaneously  account  for 
possible  beaker-to-beaker  heterogeneity  within  groups,  yet  utilize  all  the 
information  in  the  data.  In  Chapman's  and  in  Goulden's  data  sets,  length 
and  reproduction  responses  are  measured  on  daphnids  housed  one  per  beaker. 
Thus  variability  of  responses  is  estimated  based  on  observed  individual 
daphnid-to-daphnid  variability  per  group.  In  LeBlanc's  and  in  Adams’  data 
sets,  reproduction  responses  are  measured  on  a  per  beaker  basis.  Thus  the 
basic  responses  are  numbers  of  offspring  per  beaker,  normalized  to  reflect 
the  numbers  of  surviving  adults.  Thus  variability  is  estimated  based  on 
observed  beaker-to-beaker  variability  per  group.  The  situation  for  the 
length  responses  in  LeBlanc's  tests  is  a  bit  more  complex.  Daphnids  are 
multiply  housed  within  beakers;  however  lengths  are  measured  on  individual 
daphnids.  Thus  we  would  like  to  somehow  use  the  variability  among 
individual  responses.  Yet  these  individual  responses  are  possibly 
correlated  due  to  beaker-to-beaker  heterogeneity  within  groups.  A  scheme 
for  pooling  estimates  of  variability  among  beaker  averages  within  groups 
with  estimates  of  variability  among  daphnids  within  beakers  was  discussed 
in  Subsection  V-D.  This  approach  uses  variance  estimates  based  on 
variability  among  beaker  averages  but  augments  the  degrees  of  freedom  to 
reflect  the  information  about  daphnid-to-daphnid  variability  within  groups. 
The  pooling  of  information  has  the  greatest  impact  when  there  are  few 
degrees  of  freedom  for  variance  estimation  based  on  beaker  averages  within 
groups  and  there  is  little  beaker-to-beaker  heterogeneity. 

We  now  consider  various  hypothesis  testing  and  multiple  comparison 
procedures  to  compare  length  and  reproduction  responses  across  groups. 

Each  of  the  hypothesis  testing  procedures  considered  in  Section  IX  for 
mortality  responses  has  directly  analogous  counterparts  appropriate  for 


quantitative  responses  such  as  length  and  reproduction.  Chi  square  tests 
correspond  to  analysis  of  variance  tests.  Measure  of  association  tests 
correspond  to  inferences  about  the  correlation  coefficient.  The  Cochran- 
Arraitage  Test  corresponds  to  a  test  based  on  straight-line  regression 
trend.  Dunnett's  and  Williams'  multiple  comparison  procedures  carry  over 
directly.  However,  since  the  trends  in  responses  are  not  necessarily 
monotone  we  do  not  carry  out  the  test  procedures  in  the  same  sequential 
manner  that  we  did  for  mortality  responses — namely  peeling  off  the  highest 
treatment  group  and  retesting  after  each  significant  result.  Also, 
Dunnett's  procedure  is  used  for  multiple  comparisons  rather  than  Williams' 
procedure. 


REPRODUCTION  RESPONSES— ANALYSIS  OF  VARIANCE  AND  MULTIPLE  COMPARISON 
PROCEDURES 


The  analysis  of  variance  procedures  are  based  on  the  one-way  analysis 
of  variance  model.  We  used  this  procedure  in  Subsection  VI-C  when  we  were 
looking  for  outliers.  See  for  example  Figures  VI. 3,  VI. 6,  VI. 9,  VI. 12, 

VI. 15.  The  tests  we  use  in  this  section  are  directly  analogous  except  that 
we  delete  some  groups  or  some  individual  responses  because  of  excess 
mortality  or  because  of  outliers.  The  beaker  is  the  basic  response  unit. 

We  present  several  illustrative  examples  below. 


LeBlanc  Test  A — Water  Control  Group — Group  7  Deleted — Analysis  of  Variance 


There  are  4  treatment  groups  and  a  control  group,  with  4  beakers  per 
group.  There  are  thus  4  degrees  of  freedom  for  comparisons  among  groups 
and  15  degrees  of  freedom  with  which  to  estimate  error.  The  analysis  of 
variance  test  is  a  "shotgun  test",  in  the  same  manner  as  the  chi  square 
test  for  mortality  responses.  The  analysis  of  variance  table  is 

21  Dav  Cumulative  Offspring  Per  Surviving  Daphnid 


Source 


Sum  of  Squares  Mean  Square 


F  Ratio 


Between  Groups 
Within  Groups 


Adjusted  Total 


4787.3 

1 196.825 

9115.25 

607.68 

13902.55 


The  upper  90  percent  point  of  the  F  distribution  with  degrees  of  freedom  4 
and  15  is  2.27.  Thus  there  is  no  statistical  evidence  with  this  test  of 
differences  in  reproduction  rates  among  groups,  after  excluding  the  very 
high  mortality  group. 


LeBlanc  Test  A — Solvent  Control  Group — Group  7  Deleted — Analysis  of 
Variance 


The  framework  is  the  same  as  that  above,  except  that  the  solvent 
control  group  responses  are  substituted  for  the  water  control  group 
responses.  The  analysis  of  variance  table  is 

21  Day  Cumulative  Offspring  Per  Surviving  Daphnid 


Source 

D.F. 

Sum  of  Squares 

Mean  Square 

F  Ratio 

Between  Groups 

4 

6002.7 

1500.675 

2.633 

Within  Groups 

15 

8550.25 

570.017 

Adjusted  Total 

19 

14552.95 

The  upper  90  and  95  percent  points  of  the  F  distribution  with  degrees 
of  freedom  4  and  15  ar  2.36  and  3.06  respectively.  Since  the  observed  F 
ratio  falls  between  these  values  we  conclude  that  there  is  a  suggestion, 
but  not  strong  statistical  evidence  of  average  differences  in  reproduction 
rates  among  groups. 

Note  that  there  is  a  somewhat  different  outcome  with  this  test, 
depending  on  whether  the  solvent  or  the  water  control  group  is  used. 

LeBlanc  Test  B — Combined  Control  Groups — Analysis  of  Variance 

The  framework  is  similar  to  that  for  Test  A  except  that  no  treatment 
groups  are  deleted  and  the  solvent  and  water  control  group  responses  are 
combined  into  a  single  group.  The  analysis  of  variance  table  is 


21  Day  cumulative  Offspring  Per  Surviving  Daphnid 


Source 

D.F. 

Sum  of  Squares 

Mean  Square 

F  Ratio 

Between  Groups 

5 

2542.714 

508.54 

1.149 

Within  Groups 

22 

9738.000 

442.64 

Adjusted  Total 

27 

12280.714 

The  upper  90  percent  point  of  the  F  distribution  with  degrees  of  freedom  5 
and  22  is  2.13.  Thus  there  is  no  statistical  evidence  based  on  this  test 
of  differences  in  average  reproduction  rates  among  groups. 


Goulden  Isophorone  Test — Group  6  Deleted — Outlier  in  Group  1  Deleted — 
Analysis  of  Variance 


These  comparisons  are  based  on  the  reproduction  responses  from  the 
individually  housed  daphnids  in  each  group  that  survived  to  the  end  of  the 
test.  There  were  7  such  daphnids  per  group  at  the  outset  of  the  test.  All 
but  one  of  these  daphnids  survived.  Beaker  5  in  Group  1  was  determined  in 
Subsection  VI-C  to  have  an  outlying  response  and  that  was  also  deleted. 

The  comparison  is  thus  based  on  the  individual  responses  from  33  daphnids. 
The  analysis  of  variance  table  is 


21  Day  Cumulative  Offspring 

for  Each  Daphnid 

Surviving  to  the  End 

of  the  Test 

Source 

D.F. 

Sum  of  Squares 

Mean  Square 

F  Ratio 

Between  Groups 

4 

26414.46 

6603.6 2 

35.32 

Within  Groups 

28 

5234.45 

186.95 

Adjusted  Total 

32 

31648.91 

This  F  ratio  is  of  course  highly  statistically  significant.  Ther<=  is  thus 
3trong  statistical  evidence  of  average  differences  in  reproduction  levels 
among  groups.  This  is  not  very  surprising,  based  on  the  appearance  of 
Figure  11.22. 
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The  previous  discussion  in  this  subsection  has  been  based  on  one-way 
analysis  of  variance  tests.  These  tests  are  overall,  "shotgun"  tests  and 
so  are  not  the  most  sensitive  to  the  types  of  departures  from  homogeneity 
that  are  of  most  importance  in  toxicity  tests.  In  particular,  it  is  often 
desired  to  carry  out  a  series  of  pairwise  treatment  group-control  group 
comparisons  and  determine  which  treatment  group  responses  are  significantly 
different  then  the  control  group  response.  Since  such  pairwise  comparisons 
focus  in  on  the  effects  of  interest,  they  are  more  sensitive  tests  then  the 
overall  chi  square  test. 

Two  commonly  used  procedures  for  carrying  out  treatment-group  control 
multiple  comparisons  are  Williams'  test  and  Dunnett's  test.  See  Williams 
[17,181  and  Dunnett  [15,161  for  detailed  descriptions  of  their  use  and  for 
appropriate  tables.  We  have  applied  Williams'  test  to  study  the  mortality 
data.  Williams'  test  assumes  that  the  response  curve  varies  monotically 
with  concentration.  Since  this  assumption  is  not  necessarily  valid  for 
reproduction,  we  use  Dunnett's  procedure  instead.  We  present  several 
examples  below,  based  on  the  same  data  sets  as  those  discussed  above  with 
the  analysis  of  variance  tests. 


LeBlanc  Test  A — Water  Control  Group — Group  7  Deleted — Dunnett's  Test 

The  numbers  of  beakers  and  the  average  cumulative  offspring  per  beaker 
within  each  group  are 


Group  1  3 

N  4  4 

Average  102.5  128.25 


4 

4 

129 


5  6 

4  4 

139.75  101.25 


The  standard  errors  of  these  averages,  based_on  the  analysis  of  variance 
fit  are  (ct2/4)  1/,2=(607.68/4)1/,2=12.  33.  Let  denote  the  average  in  the  i- 
th  group. 

We  apply  Dunnett's  procedure  to  determine  which  groups  have  (statistic¬ 
ally)  significantly  lower  average  reproduction  then  the  control  group.  We 
declare  the  group  i  average  reproduction  to  be  significantly  lower  than  the 
control  average  if 


Xi-X1<-t(2a2/4)1/2 


The  factor  t  is  obtained  from  Dunnett's  tables  of  one-sided  factors  and  is 
derived  under  the  assumption  of  equal  group  sample  sizes.  In  this  example, 
n2  is  estimated  with  15  degrees  of  freedom.  The  value  of  t  corresponding 
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to  15  degrees  of  freedom,  4  treatment  groups,  and  a=0.05  is  t=2.36. 
Therefore  the  critical  value  is 


Xi-t(2a2/4)1/2=102.5-(2.36)(17.43l)=6l.36 


Thus  none  of  the  treatment  group  averages  are  significantly  lower  than  the 
control  group  average.  Thus  the  results  of  this  test  agree  with  those 
based  on  the  analysis  of  variance  test. 

LeBlanc  Test  A — Solvent  Control  Group — Group  7  Deleted — Dunnett's  Test 


The  numbers  of  beakers  and  the  average  cumulative  offspring  per  beaker 
within  each  group  are  the  same  as  shown  above,  except  that  Group  2  is 
substituted  for  Group  1.  The  values  for  Group  2  are  Nr4  and  Averager  154. 
The  standard  errors  of  these  averages,  based  on  the  analysis  of  variance 
fit  are  (o2/4) 1^2=(570.017/4)=1 1 .94.  Let  denote  the  average  in  the  i-th 
group.  The  group  i  average  reproduction  is  significantly  lower  than  the 
control  average  if 


Xi-X2<-t(252/4)1/2 


As  above,  the  estimated  variance  a2  has  15  degrees  of  freedom.  The  t 
factor,  corresponding  to  15  degrees  of  freedom,  4  treatment  groups,  and 
a=0.05  is  t=2.36.  Thus  the  critical  value  is 


X2-t(2a2/4)1/2=154-(2.36)(l6.88)=1l4.l6. 


This  implies  that  the  average  reproduction  rate  in  Group  6  is  significantly 
smaller  then  that  in  the  solvent  control  group  at  a=0.05.  We  can  repeat 
this  test  after  deleting  Group  6  and  adjusting  t  to  correspond  to  3  treat¬ 
ment  groups.  The  new  critical  value  is  1 54— (2.24)(16.88)=116. 19.  Thus  no 
other  groups  have  significantly  lower  average  reproduction  then  the  solvent 
control  group. 

We  come  to  different  conclusions,  depending  on  whether  the  water 
control  group  or  the  solvent  control  group  is  used  for  comparison.  This 
raises  important  conceptual  problems,  as  discussed  in  Section  VII. 


LeBlanc  Test  B — Combined  Control  Groups — Dunnett's  Test 


There  are  5  treatment  groups  and  a  (combined)  control  group.  The 
numbers  of  beakers  and  the  average  cumulative  offspring  beaker  are 


Group  1  and  2  3  4  5  6  7 

N  8  4  4  4  4  4 

Average  109.25  116  134.5  128.25  119.25  108 


The  estimated  error  variance,  based  on  the  analysis  of  variance  fit,  is 
52=442.64  with  22  degrees  of  freedom.  Let  X,  denote  the  average  in  the  i-th 
group.  The  group  i  average  reproduction  is  significantly  lower  than  the 
control  average  if 

Xi-X0<-t  S(l/Ni+l/N0)1/2 

where  X0,N0  are  the  average  reproduction  and  the  number  of  beakers  in  the 
combined  control  group.  In  this  example,  N£=4,N0=8. 

The  tabulations  o*1  Dunnett's  factors  are  based  on  the  assumption  of 
equal  sample  sizes  in  each  group.  Since  the  control  group  has  twice  as 
many  beakers  as  any  of  the  treatment  groups,  the  appropriate  values  of  t 
must  be  obtained  from  tables  of  the  multivariate  t  distribution. 

Krishnaiah  [29] ,  pp  789-800  has  published  tables  of  this  distribution.  For 
i^j  the  correlation  between  Xj-X0  and  Xj-X0  is  1/3.  We  must  enter  the 
tables  of  the  multivariate  t  distribution  at  22  degrees  of  freedom,  5 
groups,  and  correlation  p=0.33  to  find  the  upper  a=0.05  point. 

Interpolating  in  Krishnaiah's  tables  between  p=0.2  and  p=0.4  yields  t=2.43. 


Thus  the  critical  value  is 


Xo-t5(1/Ni+1/No)1/2=i09.25-(2.43)(21.04)(0.6l)=77.94 

Since  the  control  group  has  one  of  the  lowest  reproduction  rates  among  all 
the  groups,  there  are  obviously  no  groups  significantly  lower  then  the 
control . 


Goulden  Isophorone  Test — Group  6  Deleted — Outlier  in  Group  1  Deleted — 
Dunnett's  Test 


The  numbers  of  beakers  and  the  cumulative  offspring  per  beaker  within 
each  group  are 

Group  12345 

N  6  6  7  7  7 

Average  67.667  86.833  90.857  69.000  13.286 


Each  beaker  contains  just  one  daphnid  and  attention  is  confined  to  those 
daphnids  that  survived  to  the  end  of  the  test.  The  estimated  error 
variance,  based  on  the  analysis  of  variance  fit,  is  o^=l86.95  with  28 
degrees  of  freedom.  Let  denote  the  average  in  the  i-th  group.  The 
group  i  average  reproduction  is  significantly  lower  than  the  control  if 

Xi-X1<-to(  1/Nj+l/N-,  )1/2 

Even  though  the  sample  sizes  are  not  exactly  equal,  they  are  almost  equal, 
and  so  we  use  Dunnett's  factors  as  an  approximation.  Thus  the  critical 
value  is 

X1-ta(1/Ni+1/N1)1/2=67.667-(2.26)(13.6?)(1/Ni+1/6)1/2 

=49.83  if  i=2 
=50.48  if  1=3, 4, 5 

Thus  Group  5  has  a  significantly  smaller  reproduction  rate  then  the  control 
group.  Obviously  none  of  the  other  groups  do. 


C.  LENGTH  RESPONSES— ANALYSIS  OF  VARIANCE  AND  MULTIPLE  COMPARISON 
PROCEDURES 


The  situation  for  comparisons  of  lengths  across  groups  is  similar  to 
that  for  comparisons  of  reproduction,  except  that  lengths  are  measured  on  a 
per  daphnid  basis  whereas  reproduction  is  measured  on  a  per  beaker  basis. 
Thus  unless  there  is  just  one  daphnid  per  beaker,  less  information  is 
obtained  on  reproduction  than  on  lengths. 

A  basic  technical  difficulty  associated  with  directly  analyzing  the 
individual  length  determinations  is  due  to  the  correlations  among  responses 
from  daphnids  in  the  same  beaker.  This  intercorrelation  results  from  the 
beaker-to-beaker  heterogeneity  within  groups.  A  common  practice  is  to 
summarize  the  individual  length  measurements  within  each  beaker  by  the 
average  length  and  then  analyze  the  averages  on  a  per  beaker  basis,  as  was 
previously  done  for  the  reproduction  responses.  This  is  essentially 
equivalent  to  fitting  a  two-way  nested  analysis  of  variance  model  to  the 
data,  as  was  explained  and  illustrated  in  Subsection  IV-B,  and  using  the 
mean  square  for  variation  among  beakers  within  groups  as  the  error  term  for 
making  inferences  about  treatment  effects. 

If  the  mean  square  for  beakers  within  groups  is  no  greater  than  the 
mean  square  for  variation  among  daphnids  within  beakers  then  the  two  mean 
squares  are  sometimes  pooled  and  used  as  a  common  error  term.  The  latter 


mean  square  usually  has  many  more  degrees  of  freedom  than  the  former.  In 
Subsection  V-D  we  discussed  a  scheme  for  pooling  information  from  these  two 
mean  squares  in  a  continuous  manner.  Basically,  the  mean  square  for 
beakers  within  groups  is  used  as  the  error  yardstick  but  information  from 
the  within  beaker  mean  square  is  used  to  augment  the  degrees  of  freedom  in 
a  continuous  manner.  The  closer  are  the  two  mean  squares,  the  greater  are 
the  degrees  of  freedom.  See  Subsection  V-D  for  details.  The  results  of 
this  pooling  procedure  applied  to  the  length  responses  from  LeBlanc's  Tests 
A  and  B  are: 

Test  A:  o2=0.2722  with  98  degrees  of  freedom  (excluding  Group  7) 

Test  B:  o2_o,2707  with  141  degrees  of  freedom 

We  use  these  error  estimates  in  subsequent  analyses. 

The  analysis  of  variance  procedures  are  based  on  the  two-way  nested 
analysis  of  variance  model.  Although  we  should  separate  the  control  groups 
in  Test  A,  combine  the  control  groups  in  Test  B,  and  re-estimate  the  error 
variances  from  the  modified  data,  we  utilize  the  analysis  of  variance  fits 
shown  in  Subsection  IV-B  to  illustrate  the  calculation  of  the  analysis  of 
variance  tests.  Although  these  analyses  of  variance  do  not  test  quite  the 
right  hypotheses,  they  do  illustrate  the  appropriate  methodology. 


LeBlanc  Test  A — Analysis  of  Variance  Test 

From  Subsection  IV-B,  mean  square  between  groups  =1.1163  with  5  degress 
of  freedom.  Mean  square  for  beakers  within  groups  =0.2722  with  98  degrees 
of  freedom.  Thus 

F  =  — =  4.101  with  5  and  98  degrees  of  freedom. 

0.2722 

This  F  ratio  is  statistically  significant  at  a =0.002.  Thus  there  is  strong 
statistical  evidence  of  differences  in  average  lengths  among  groups.  From 
Figure  11.26  it  appears  that  average  length  in  the  solvent  control  group  is 
somewhat  greater  than  the  average  lengths  in  the  other  groups. 


LeBlanc  Test  B — Analysis  of  Variance  Test 


From  Subsection  IV-B,  mean  square  between  groups  =0.2915  with  6  degrees 
of  freedom.  Mean  square  for  beakers  within  groups  =0.2707  with  14 1  degrees 
of  freedom.  Thus 

F  =  =  1-077  with  6  and  141  degrees  of  freedom. 


This  F  ratio  is  significant  at  a =0.38.  Thus  there  is  no  statistical 
evidence  of  differences  in  average  lengths  among  groups.  The  relatively 
small  lengths  in  Group  7  do  not  show  up  in  this  test.  See  Figure  11.29  for 
a  graphical  display  of  the  group-to-group  variation  in  lengths. 

We  now  carry  out  pairwise  comparisons  of  treatment  group  and  control 
group  average  responses  using  Dunnett's  test.  Lengths  were  measured  by 
LeBlanc  in  Tests  A  and  B  and  by  Chapman  in  his  Beryllium  test.  For  the 
LeBlanc  data  we  use  the  variance  estimates  and  degrees  of  freedom  arrived 
at  in  Subsection  V-D,  namely  0^=0.2722  with  98  degrees  of  freedom  for  Test 
A  (excluding  Group  7)  and  0^=0.2707  with  141  degrees  of  freedom  for  Test  B. 

Before  considering  the  examples  below,  we  must  address  a  technical 
issue  concerning  the  estimation  of  the  average  responses  within  each  group. 
There  are  two  components  of  variation,  a  beaker-to-beaker  component  and  a 
daphnid-to-daphnid  component  within  beakers.  The  question  arises  as  to 
whether  we  should  calculate  a  simple  average  of  all  the  daphnid  responses 
within  each  group,  calculate  the  average  of  the  average  beaker  responses, 
or  perhaps  some  compromise  between  these  two  averages.  In  the  balanced 
situation  where  each  beaker  has  the  same  number  of  daphnids  these  two 
averaging  processes  yield  the  same  results  and  there  is  no  ambiguity. 
However  in  the  unbalanced  case  those  averaging  processes  can  yield  very 
different  results.  In  LeBlanc's  Tests  A  and  B,  most  of  the  beakers  within 
groups  have  between  15  and  20  daphnids  and  so  averaging  processes  based  on 
the  balanced  case  should  be  quite  reasonable.  We  thus  use  unweighted 
averages  of  the  individual  responses  in  these  groups.  However  the 
situations  in  Group  6  of  Test  A  and  in  Group  7  of  Test  B  are  different. 

The  sample  sizes  in  the  four  beakers  in  Group  6  of  Test  A  are  18,  16,  5, 
and  1  and  sample  sizes  in  the  four  beakers  in  Group  7  of  Test  B  are  3,  I1*, 
12,  and  1.  These  are  very  highly  imbalanced.  Thus  how  should  the  average 
responses  be  estimated,  in  these  groups?  A  complete  answer  to  this 
question  is  a  research  problem  in  its  own  right  and  we  will  not  attempt 
that  here.  However  an  intuitively  reasonable  approach  would  be  to 
calculate  that  average  of  the  responses  that  most  precisely  estimates  mean 
length  within  the  group.  Let  oo,  denote  the  variance  components  due  to 
beakers  and  daphnids  respectively  and  let  p=o^/a^.  Suppose  there  are  J 
beakers,  Nj  daphnids^  within  the  j-th  beaker,  and  the  average  response  in 
the  j-th  beaker  is  X-i.  Then  Var(X<)=o^+  /N-t.  We  consider  estimates  of 

•  ,  «  J  p  G  J 


u=£J  w-jX-j  with  w.j>0,  w,  =  1 

j=1  J  J  J~  j=1  J 

We  choose  the  weights  so  as  to  minimize  the  variance  of  u.  This  is  a 
Lagrange  multiplier  problem.  The  solution  is  to  choose  wj  proportional  to 
Nj/(1+Njp).  As  p  approaches  o,  we  tend  to  average  individual  responses  and 
as  p  approaches  infinity  we  tend  to  average  group  averages.  The  general 
situation  is  a  compromise  between  these  two  extremes.  In  the  LeBlanc  data 
sets,  p  is  estimated  to  be  0,09  in  Test  A  and  0.06  in  Test  B.  Since  these 
values  of  p  are  small,  we  use  the  simple  averages  of  the  individual 
observations  for  weighting  purposes,  although  the  "optimum"  weights  call 
for  assigning  lower  weights  to  the  beakers  with  relatively  large  numbers  of 
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daphnids.  The  variance  of  this  simple  average  is  o|/ INj+o£-INj2/( zNj)2.  We 
use  this  variance  expression  for  the  two  groups  witn  highly  unbalanced 
samples  sizes.  For  the  other  groups,  we  use  variance  expressions 
appropriate  for  the  balanced  case  as  these  should  be  reasonable 
approximations.  A  more  rigorous  analysis  could  be  based  on  maximum 
likelihood  methods,  but  even  here  the  theory  holds  only  asymptotically. 

We  now  consider  several  examples. 


LeBlanc  Test  A — Water  Control  Group — Group  7  Deleted — Dunnett's  Test 


The  numbers  and  average  lengths  of  surviving  daphnids  within  each  group 
and  the  standard  errors  of  the  mean  are 


Group 

1 

3 

4 

5 

6 

flNj 

Average 

68 

70 

73 

73 

40 

4.9676 

4.9743 

5. 1027 

4.9658 

5.0000 

Std  Error 

0.063 

0.062 

0.061 

0.061 

0.080 

The 

standard 

error  estimates  in  Group 

1-5  are 

based  on  the 

square  root 

of 

the 

beakers  within  groups 

mean  square 

shown  in 

Subsection  V 

-D,  divided 

by 

N j .  The  standard  error  estimate  in  Group  6  is  based  on  the  expression  in 
the  paragraph  preceding  the  example.  We  associate  98  degrees  of  freedom 
with  these  standard  error  estimates.  Let  denote  the  average  in  the  i-th 
group.  The  group  i  average  length  is  significantly  lower  than  the  control 
average  if 


Xi-Xi^c-tCstd  err2+std  err2)^2 
i  1 

The  t  factor,  corresponding  to  98  degrees  of  freedom,  4  treatment  groups, 
and  a=0.05  is  t=2.19.  Thus  the  critical  values  are: 


Group  3  vs  Control: 
Group  4  vs  Control: 
Group  5  vs  Control: 
Group  6  vs  Control: 


4.9676- (2.19)(O.O632+0.O622)1/2=4.774 

4.9676- (2.19)(0.0632-t-0.06l2)1/2=4.776 

4.9676- (2.19)(0.0632+0.06l2)1/2=4.776 

4.9676- (2. 19)(0.0632+0.0802)1/2=4.745 


Since  none  of  the  group  averages  are  below  their  critical  values,  we 
conclude  that  there  is  no  statistical  evidence  that  any  of  the  treatment 
groups  have  significantly  lower  average  lengths  then  the  water  control 


LeBlanc  Test  A — Solvent  Control  Group— Group  7  Deleted — Dunnett's  Test 


The  numbers  and  average  lengths  of  surviving  daphnids  within  each  group 
and  the  standard  errors  are  the  same  as  those  shown  above,  except  that 
Group  2  is  substituted  for  Group  1.  The  values  for  Group  2  are  INj=77, 
Average=5.2753,  Std  Error =0.059.  The  group  i  average  length,  Xj,  is 
significantly  lower  then  the  solvent  control  group  average  if 


Xi-X2<-t(std  err^2+std  er^2)1^2 


The  t  factor  is  again  2.19.  Thus  the  critical  values  are: 


Group  3  vs  Control: 
Group  4  vs  Control: 
Group  5  vs  Control: 
Group  6  vs  Control: 


5.275M2. 19)  (0.0592+0.0622)  1/2=5.088 

5.2753-(2.19)(0.0592+0.06l2)1/2=5.089 

5.2753- (2. 19)(0.0592+0.06l2)1/2=5.089 

5. 2753-  (2. 19) (0.0592+0.0802) 1/2=5.058 


The  averages  in  Groups  3,  5,  and  6  are  below  their  critical  values  and  so 
these  averages  are  significantly  lower  then  the  control  group  average. 

Since  Groups  3,  5,  and  6  differ  significantly  from  the  control  group, 
we  recalculate  the  critical  value  for  Group  4  based  on  just  one  treatment 
group  (i.e.,  the  usual  Student's  t  distribution).  The  t  factor  changes 
from  2.19  to  1.66.  Thus  the  critical  value  for  Group  4  vs  Control  is 

5.2753-(1.66)(0.0592+0.06l2)1/2=5.134 

The  Group  4  average  is  lower  then  this  value  and  thus  is  also  significantly 
lower  then  the  control. 

We  conclude  that  all  of  the  treatment  groups  have  significantly  lower 
average  lengths  than  the  solvent  control  group,  at  the  a=0.05  level  of 
significance. 

Note  that  we  have  arrived  at  diametrically  opposite  conclusions,  de¬ 
pending  on  whether  the  water  control  group  or  the  solvent  control  group  is 
used  for  comparison.  This  situation  is  clearly  seen  in  Figure  11.26.  As 
remarKed  previously,  these  contradictory  conclusions  lead  to  important 
conceptual  problems  in  interpreting  the  results  of  the  test. 


LeBlanc  Test  B-- Combined  Control  Groups — Dunnett's  Test 


The  numbers  and  average  lengths  of  surviving  daphnids  in  each  group  and 
the  standard  errors  of  the  means  are 


Group 

1  and  2 

3 

4 

5 

6 

7 

.  NJ 

Average 

139 

76 

75 

67 

72 

30 

4.8907 

4.8645 

4.9360 

4.8358 

4.8333 

4.7267 

Std  Err 

0.044 

0.060 

0.060 

0.064 

0.061 

0.088 

The  standard  error  estimates  in  the  control  group  and  in  treatment  Groups  3-6 
are  based  on  the  square  root  of  the  beakers  within  groups  mean  square  shown  in 
Subsection  V-D,  divided  by  INj.  The  standard  error  estimate  in  Group  7  is 
based  on  the  expression  in  the  paragraph  preceding  the  first  example  in 
this  series.  We_associate  141  degrees  of  freedom  with  these  standard  error 
estimates.  Let  Xj,XD  denote  the  average  in  the  i-th  treatment  group  and  in 
the  combined  control  group  respectively.  The  group  i  average  length  is 
significantly  lower  than  the  control  average  if 

Xi-XoC-tCstd  erri2+std  erro2)^2 


The  t  factor,  corresponding  to  14 1  degrees  of  freedom,  5  treatment  groups, 
and  a=0.05  is  obtained  from  Krishnaiah's  tables  of  the  multivariate  t 
distribution,  pp  789-800.  Since  the  number  of  beakers  and  daphnids  in  the 
combined  control  group  is  about  twice  that  in  the  treatment  groups  (except 
for  Group  7),  the  correlation  between  Xj^-X,-,  and  Xj-X0  is  about  1/3  for 
i^j^Y .  We  enter  Krishnaiah’s  tables  at  141  degrees  of  freedom,  5  groups, 
and  correlation  p=0.33  to  find  the  upper  a=0.05  point.  Interpolating 
between  p=0.2  and  p=0.4  yields  t=2.37.  (The  table  actually  goes  up  to  only 
35  degrees  of  freedom.)  Thus  the  critical  values  are: 


Grout.  3  vs  Control:  4.8907-(2.37)(0.0442+0.0602)1/2=4.7l4 
Group  4  vs  Control:  4. 8907-(2. 37 ) (0.0442+0. 0602) 1/2=4.7 14 
Group  5  vs  Control:  4.8907- (2. 37) (0.0442+0.0642) 1/2=4.707 
Group  6  vs  Control:  4.8907-(2. 37) (0.0442+0. 06 1 2 ) 1/2=4.712 
Group  7  vs  Control:  4.8907-(2. 37) (0.0442+0. 0882) 1/2=4.658 


Since  none  of  the  group  averages  are  less  than  their  critical  values,  we 
conclude  that  there  is  no  statistical  evidence  that  any  of  the  treatment 
groups  have  significantly  lower  average  lengths  than  the  combined  control 
group.  The  appearance  of  Figure  11.29  bears  out  this  conclusion,  except 
for  Group  7  which  is  a  bit  lower.  However  it  is  not  significantly  lower. 
The  analysis  of  variance  test  gave  the  same  conclusions. 


XII.  DOSE  RESPONSE  CURVE  ESTIMATION-MULTIPLE  REGRESSION 
ANALYSIS  OF  LENGTH  AND  REPRODUCTION  DATA 


A.  INTRODUCTION 


In  the  previous  section  we  considered  hypothesis  testing  and  multiple 
comparison  procedures  to  test  for  the  presence  of  differences  among  groups 
in  average  length  and  reproduction.  These  testing  procedures  are  analogous 
to  those  considered  in  Section  IX  for  mortality  responses.  In  this  section 
we  consider  fitting  multiple  regression  dose  response  models  to  the  length 
and  reproduction  data  to  estimate  the  concentrations,  C^,  which  result  in 
reductions  of  L  relative  to  the  control  group  levels.  These  multiple 
regression  procedures  for  quantitative  responses  are  directly  analogous  to 
the  probit  analysis  dose  response  model  for  qualitative  responses  that  was 
considered  in  Section  X. 

The  conceptual  distinctions  between  the  hypothesis  testing  procedures 
of  the  previous  section  and  the  multiple  regression  procedures  of  this 
section  are  the  same  as  the  distinctions  between  corresponding  procedures 
for  mortality  data  that  are  discussed  at  the  beginning  of  Section  X.  In 
brief,  inferences  based  on  the  multiple  regression  procedures  incorporate 
biological  significance  as  well  as  statistical  significance  and  tend  to 
result  in  tighter  confidence  bounds  on  safe  concentrations  (and  thus  in 
more  liberal  lower  bounds)  as  the  amount  and  precision  of  the  data 
increase.  We  feel  that  this  approach  to  inference  has  more  appeal  than 
hypothesis  testing. 

The  two  principal  technical  differences  between  the  multiple  regression 
models  appropriate  for  studying  length  and  reproduction  responses  and  the 
probit  model  appropriate  for  studying  mortality  responses  relate  to  the 
nonmonotone  nature  of  the  trends  in  length  and  reproduction  and  the  need  to 
supply  estimates  of  variability.  Both  of  these  considerations  were 
discussed  in  Subsection  XI-A  and  so  need  not  be  repeated  here. 

Feder  and  Collins  [11  discuss  fitting  multiple  regression  models  to 
weight  gain  data  from  early  life  stage  tests  with  fathead  minnows,  in 
Subsection  XVII-C  of  their  report. 

We  now  discuss  the  specific  models  that  were  fitted  to  the  data.  Let 
x=logio  (concentration).  (Since  the  regression  models  will  be  fitted  only 
to  the  treatment  groups,  there  is  no  problem  with  the  logarithm  of  0.)  Let 
m,s  denote  location  and  scale  standardization  factors  respectively.  (Note 
that  these  are  not  necessarily  the  mean  and  standard  deviation.)  Let  vh(x- 
m)/s  denote  the  standardized  version  of  x  and  let  y  denote  the  response 
(average  length  or  cumulative  reproduction  per  surviving  adult).  Let  I 
denote  an  indicator  function  of  the  treatment  groups.  That  is,  1=1  for 
treatment  groups  and  1=0  for  control  groups.  The  models  fitted  are: 
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Straight  Line 


Y=u+S0I+6^Iv+e 


Quadratic 


Y=p+e0I+6iIv+62Iv2+e 

In  these  model  specifications  u  represents  the  control  group  average, 
B0,8-|,  and  (possibly)  8 2  represent  the  coefficients  of  the  polynominal 
trend  in  the  region  of  the  treatment  groups.  By  the  parameterization  of 
the  model,  the  polynominals  80+8-|v  or  B0+ B-jv+- B2V2  represent  the  difference 
between  the  treatment  group  expected  response  at  v  and  the  control  group 
average.  Since  the  average  length  and  reproduction  levels  in  the  control 
groups  generally  differ  from  zero,  they  must  be  adjusted  for  in  our  models. 
We  wish  to  estimate  the  value  of  v  where 

60+B-|V=c 


or 


80+Biv4-82V2=c 


and  place  a  confidence  interval  on  this  value.  The  theory  underlying  the 
point  and  confidence  interval  estimation  procedure  is  discussed  in  Appendix 
AXII.1.  A  computer  program  to  carry  out  the  calculations  is  described  in 
Appendix  AXII.2. 

We  now  discuss  fitting  regression  models  to  the  reproduction  responses 
and  to  the  length  responses  in  turn. 


B.  REPRODUCTION  RESPONSES 


In  this  subsection  we  fit  multiple  regression  models  to  study  trends  in 
cumulative  reproduction  as  concentration  increases.  We  fit  linear  or 
quadratic  models,  as  appropriate,  to  the  treatment  group  data  and  estimate 
concentrations  resulting  in  specified  decreases  from  the  control  group  in 
average  reproduction.  The  same  approach  could  of  course  be  extended  to  fit 
cubic  models,  exponential  models,  etc.,  to  the  data.  Such  extensions  have 
not  been  explored,  but  are  straightforward.  Since  reproduction  responses 
are  measured  on  a  per  beaker  basis,  there  are  no  questions  of  or 
complications  due  to  beaker-to-beaker  heterogeneity.  In  the  Chapman  and 
Goulden  tests,  reproduction  was  measured  on  individually  housed  daphnids. 
Only  the  daphnids  that  survived  to  the  end  of  the  test  were  included  in  the 
analysis.  We  now  consider  the  details  of  the  fits  to  the  various  data 
sets. 
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LeBlane  Test  A — Water  Control  Group — Group  7  Deleted 


There  are  four  treatment  groups  and  a  control  group.  The  concen¬ 
trations  corresponding  to  the  lowest  two  treatment  groups,  Groups  3  and  *1, 
are  virtually  identical.  We  are  thus  fitting  a  quadratic  model  to 
essentially  three  distinct  concentrations  (recall  that  we  have  excluded  the 
control  group).  The  model  fitted  is 

Y=u+80I+Bi Iv+B^Iv^+c 

Where  v=(x+2.257)/0.31 1  (i.e. ,m=-2.257  and  s:0.311).  The  results  of  this 
fit  are  shown  in  Figure  XII. 1.  The  observed  and  predicted  values  are 
plotted  in  Figure  XII. 2  vs  logig  (concentration).  The  estimated  regression 
coefficients  and  their  estimated  variance  covariance  matrix  are  shown  in 
Figure  XII. 1.  Figure  XII. 2  shows  a  nonlinear  trend  in  average  response 
within  the  range  of  the  treatment  groups.  The  quadratic  coefficient  in 
Figure  XII. 1  is  marginally  significant  (a=0.08).  We  thus  base  estimates  of 
acceptable  concentratio  s  on  the  quadratic  model. 

The  output  from  CONFINT  applied  to  this  model  is  shown  in  Figures 
XII. 3, 4.  We  calculate  point  and  confidence  interval  estimates  of 
concentrations  corresponding  to  10  percent  and  20  percent  reductions  in 
average  reproduction,  relative  to  the  control  group.  Since  the  estimated 
cumulative  reproduction  rate  in  the  water  control  group  is  102.5  offspring 
per  adult,  the  values  of  c  specified  are  -10.25  and  -20.5  respectively. 

Two  roots  are  shown  in  each  Figure.  The  smaller  roots,  -2.76  and  -2.81  in 
logarithmic  concentration  units,  are  obviously  inappropriate  based  on  the 
appearance  of  Figure  XII. 2.  We  thus  confine  attention  to  the  larger  roots. 
These  correspond  to  roots  no.  1  in  the  two  outputs.  From  the  bottoms  of 
Figures  XII. 3, 4  we  read  these  as 

10  percent  decrease:  conc=0.0l68  95  percent  conf  interval  (0.0107,  0.0262) 
20  percent  decrease:  conc=0.0l87  95  percent  conf  interval  (0.0116,  0.0300) 


These  point  estimates  both  exceed  the  concentration  in  the  highest 
treatment  group.  Thus  they  represent  extrapolations  beyond  the  range  of 
the  data,  with  the  consequent  possibility  of  extrapolation  biases. 

However,  since  the  amount  of  extrapolation  is  very  little,  this  should  not 
be  a  problem  in  this  example. 


LeBlane  Test  A— Solvent  Control  Group--Group  7  Deleted 


The  framework  is  the  same  as  in  the  previous  example  except  that  the 
solvent  control  group  (group  2)  is  substituted  for  the  water  control  group. 
There  are  again  essentially  three  distinct  treatment  group  concentrations. 
The  model  fitted  Is  the  same  as  that  in  the  previous  example.  The 


standardization  factors  are  also  the  same.  The  results  of  this  fit  are 
shown  in  Figures  XII. 5  to  XII. 8.  The  quadratic  fit  to  the  treatment  group 
responses  is  the  same  as  in  the  previous  example  but  the  control  group 
average  is  somewhat  greater  (154.0  vs  102.5). 

The  output  from  CONFINT  applied  to  this  model  is  shown  in  Figures 
XII. 7, 8.  Since  the  estimated  cumulative  reproduction  rate  in  the  solvent 
control  group  is  154.0  offspring  per  adult,  the  values  of  c  corresponding 
to  10  percent  and  20  percent  reductions  are  -15.4  and  -30.8  respectively. 
As  in  the  previous  example,  the  larger  of  the  two  roots  in  each  output  are 
the  ones  of  biological  importance.  These  correspond  to  roots  no.  1  in  the 
outputs.  From  the  bottoms  of  Figures  XII. 7, 8  we  read  these  as 


10  percent  decrease:  conc=0.0077  95  percent  conf  interval  (0.0023,  0.0254) 
20  percent  decrease:  conc=0.0110  95  percent  conf  interval  (0.0064,  0.0189) 


Note  that  the  confidence  interval  for  the  concentration  associated  with  the 
10  percent  decrease  is  much  wider  than  that  associated  with  the  20  percent 
decrease.  This  is  because  the  point  estimate,  x=-2.11,  lies  near  the 
stationary  point  of  the  curve.  Thus  the  slope  of  the  curve  is  very  gentle 
in  this  region  and  so  small  changes  in  the  curve  correspond  to  large 
changes  in  concentration. 

Comparing  the  concentrations  associated  with  10  percent  and  20  percent 
decreases  based  on  the  solvent  control  group  with  those  based  on  the  water 
control  group,  we  see  substantially  lower  point  estimates  with  the  solvent 
control  group  than  with  the  water  control  group.  The  confidence  interval 
associated  with  10  percent  decreases  from  the  solvent  control  group 
response  is  so  wide  as  to  be  useless.  The  other  three  confidence  intervals 
span  a  factor  of  three  in  concentration  levels  and  so  are  also  too  wide  to 
provide  very  precise  estimates.  However,  the  confidence  interval  associ¬ 
ated  with  20  percent  decreases  from  the  solvent  control  group  is 
substantially  lower  than  that  associated  with  20  percent  decreases  from  the 
water  control  group.  (The  endpoints  for  the  water  control  interval  are  60 
percent  to  80  percent  higher  than  those  for  the  solvent  control  interval.) 

It  appears  that  although  the  confidence  intervals  are  rather  wide,  they 
are  precise  enough  to  conclude  that  qualitatively  different  results  are 
obtained  depending  on  whether  the  water  control  group  or  the  solvent  con¬ 
trol  group  is  used  for  comparison.  This  results  in  important 
interpretational  ambiguity. 


Chapman-Beryllium  Test 


The  data  consist  of  water  and  solvent  control  groups  and  six  treatment 
groups.  Since  the  daphnids  were  individually  housed,  individual  reproduc¬ 
tion  determinations  were  made  and  these  are  used  for  analysis.  Responses 
only  from  daphnids  that  survived  to  the  end  of  the  test  are  included  in  the 
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analysis.  The  results  from  the  two  control  groups  were  combined  for 
comparison  with  the  treatment  groups.  The  results  of  the  analysis  are 
shown  in  Figures  XII. 9  to  XII. 13. 

The  quadratic  model  that  was  discussed  previously  was  fitted  to  the 
data.  The  independent  variable,  v,  was  defined  as  v=(x-1 .276)/0.551 
(i.e. ,ra=1.2?6  and  s=0.551).  The  results  of  the  fit  are  shown  in  Figure 
XII. 9.  The  observed  and  predicted  values  are  plotted  in  Figure  XII. 10  vs 
l°g10  (concentration).  The  quadratic  term  is  nonsignificant  (B2=-6.042 
with  estimated  standard  error=8 . 433 ) .  There  appears  to  be  no  trend  in  average 
reproduction  with  increasing  concentration  displayed  in  Figure  XII. 10.  The 
quadratic  term  was  deleted  from  the  model  and  the  straight  line  model 
Y=u+60I+6iIx+e  was  fitted  to  the  data.  The  results  are  shown  in  Figure 
XII.  11.  The  linear  term  is  nonsignificant  (6-j=5.8l6  with  estimated 
standard  error  13.165).  We  thus  conclude  that  there  is  no  significant 
trend  in  reproduction  within  the  range  of  the  treatment  group 
concentrations.  It  is  therefore  meaningless  to  estimate  concentrations 
associated  with  specified  reductions  from  the  control  response.  If  we  go 
ahead  anyway  and  formally  carry  out  the  inference,  we  get  results  such  as 
shown  in  Figure  XII. 12.  The  confidence  interval  on  the  concentration 
associated  with  a  10  percent  reduction  from  the  control  group  average 
ranges  from  0  to  infinity.  It  is  thus  of  course  meaningless. 

Because  of  the  lack  of  trend  among  the  treatment  group  responses, 
observed  in  the  regression  outputs  and  in  the  display  in  Figure  XII. 10,  we 
calculate  an  overall  level  of  reproduction  within  the  treatment  groups  for 
comparison  with  the  reproduction  rate  in  the  control  groups.  In  particular 
for  the  N=45  surviving  daphnids  in  treatment  groups  3-8,  the  mean  and 
standard  deviation  are 


Y i =92.844 


0-1=43.242 


For  the  N=l8  surviving  daphnids  in  control  groups  1-2,  the  mean  and 
standard  deviation  are 


Y0=162.5 


o0=57 . 835 


Comparing  the  treatment  group  and  control  group  averages  by  means  of  a  (one 
tailed)  two  sample  t-test  we  obtain 


Y  -  Y 
1  0 


(44n  2  +  17o02)/61  (1/45  +  1/18)  ] 1/2 


92.844  -  162.5 
13.32 


5.23 


with  61  degrees  of  freedom.  This  statistic  is  significant  at  a=0.0000. 
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There  is  thus  strong  statistical  evidence  that  beryllium  diminishes 
reproduction.  It  appears  that  it  is  a  substantial  reduction.  Since  there 
is  no  response  trend  among  the  treatment  groups,  we  conclude  that  the 
concentration  at  the  lowest  treatment  group  produces  a  (biologically  and 
statistically)  significant  reduction  in  reproduction.  This  concentration 
is  2.60  racg/1. 


Goulden-Isophorone  Test — Group  6  Deleted — Outlier  in  Group  1  Deleted 


There  are  a  control  group  and  four  treatment  groups  (2-5).  As  with  the 
Chapman  data,  individual  reproduction  determinations  are  used  for  analysis. 
Responses  only  from  daphnids  that  survived  to  the  end  of  the  test  are 
included  in  the  analysis.  The  outlying  value  (on  the  high  side)  in  beaker 
5  of  group  1  is  deleted.  The  previously  discussed  quadratic  model  was 
fitted  to  the  data.  The  independent  variable,  v,  was  defined  as  v  =  (x- 
1 .745)/0.443  (i.e.,  m=1.745,  s=0.443).  The  results  of  the  fit  are  shown  in 
Figure  XII. 13  and  the  observed  and  predicted  values  are  plotted  vs  logio 
+  concentration)  in  Figure  XII. 14.  The  quadratic  term  is  seen  to  be  highly 
significant  (§2=29.84  with  estimated  standard  error  4.22)  and  strong 
quadratic  trend  is  evident  in  Figure  XII. 14. 

The  output  from  CONFINT  applied  to  this  model  shown  in  Figures  XII. 15, 
16.  Since  the  estimated  average  cumulative  reproduction  rate  in  the 
control  group  is  67.67  offspring  per  adult,  the  values  of  c  corresponding 
to  10  percent  and  20  percent  reductions  are  -6.77  and  -13.53  respectively. 
As  in  previous  examples,  the  larger  of  the  two  roots  in  each  output  are  the 
ones  of  biological  importance.  These  correspond  to  roots  no.  1  in  the 
outputs.  From  the  bottoms  of  Figures  XII.  15,  16  we  read  these  as 


10  percent  decrease:  conc=94.571  95  percent  conf  interval  (77.730,  115.060) 
20  percent  decrease:  conc=103.136  95  percent  conf  interval  (85.841,  123.915) 


Due  to  the  steepness  of  the  response  trend,  both  the  concentrations 
associated  with  10  percent  and  with  20  percent  reductions  in  reproduction 
are  reasonably  precisely  determined.  However,  because  of  this  same 
steepness  in  trend,  these  two  concentrations  are  close  to  one  another  and 
cannot  be  well  separated  by  the  information  from  this  toxicity  test. 


C.  LENGTH  RESPONSES 


The  situation  is  essentially  the  same  as  for  reproduction  responses 
except  for  the  need  to  adjust  for  the  effects  of  beaker-to-beaker 
heterogeneity  in  those  cases  when  the  daphnids  are  multiply  housed  (e.g., 
LeBlanc's  Tests  A  and  B).  These  considerations  have  been  discussed  in 
detail  in  previous  sections  (see  e.g.,  Subsection  XI-C)  and  so  need  not  be 
discussed  again  here.  The  models  fitted  and  the  notation  used  are  the  same 
as  those  for  the  reproduction  response.  We  consider  several  examples. 


LeBlanc  Test  A — Solvent  Control  Group — Group  7  Deleted 

There  are  four  treatment  groups  and  a  control  group.  The 
concentrations  corresponding  to  the  two  lowest  treatment  groups,  Groups  3 
and  M,  are  essentially  the  same.  The  quadratic  model  fitted  is 

Y=p+80I+8  ilv+f^Iv^+e 

where  ve (x+2.257)/0. 3 1 1 .  The  results  of  this  fit  are  shown  in  Figure 
XII. 17.  The  observed  and  predicted  values  are  plotted  in  Figure  XII. 18  vs 
logio  (concentration).  The  quadratic  term  in  Figure  XII. 17  has  the  wrong 
sign  (positive),  is  small,  and  is  nonsignificant.  The  display  in  Figure 
XII. 18  shows  no  concentration  related  trend  in  average  length  within  the 
range  of  the  treatment  groups.  All  the  treatment  groups  appear  to  have 
substantially  lower  average  length  than  the  solvent  control  group. 

The  analysis  shown  in  Figure  XII. 17  was  carried  out  on  a  per  beaker 
basis.  That  is,  average  lengths  were  calculated  within  each  beaker  and 
regression  models  were  fitted  to  these  averages.  This  of  course  resolves 
the  issue  of  correlated  responses  within  beakers.  The  residual  mean  square 
in  Figure  XII. 17  corresponds  conceptually  to  the  mean  square  for  beakers 
within  groups  that  was  discussed  in  subsection  V-D.  If  the  data  had  been 
completely  balanced  (i.e.,  equal  numbers  of  daphnids  per  beaker)  and  if  the 
same  regression  models  had  been  fitted  to  the  data  (an  analysis  of  variance 
model  was  used  in  Subsection  V-D  and  a  quadratic  regression  model  was  used 
in  this  analysis)  then  the  two  mean  squares  would  be  exactly  the  same. 

Since  this  data  set  is  nearly  balanced  (except  for  two  beakers  in  Group  6 
with  small  numbers  of  daphnids),  the  two  mean  squares  should  be  very  close. 
This  is  in  fact  seen  to  be  the  case.  The  mean  square  for  beakers  within 
groups  was  calculated  in  Subsection  VD  to  be  MSI=0.2722.  The  average 
number  of  daphnids  per  beaker  is  N=16.7.  When  MSI  is  normalized  to 
correspond  to  the  variability  of  beaker  averages,  the  estimated  variance  is 
approximately  MSI/N=0. 2722/ 16.7=0.0163.  This  is  very  similar  to  the  error 
mean  square  value  of  0.0176  calculated  in  Figure  XII17.  Thus  the 
discussion  in  Subsection  VD  about  augmenting  the  degrees  of  freedom  of  MSI 
based  on  pooling  information  from  the  variability  among  daphnids  within 
beakers,  also  holds  for  this  analysis.  We  might  thus  assign  98  degrees  of 
freedom  (i.e.,  the  number  calculated  in  Subsection  V-D)  to  the  error  mean 
square  rather  than  the  16  degrees  of  freedom  obtained  directly  from  the 
regression  output.  This  would  reduce  the  observed  significance  level  for 
the  F  test  for  the  quadratic  coefficient  from  0.265  to  0.251.  If  we  used 
the  mean  square  value  Q.0163>  from  Subsection  VD,  this  would  further  reduce 
the  observed  significance  level  to  0.233.  Since  such  minor  changes  are  of 
no  practical  importance,  we  do  not  pursue  this  possibility  further. 

The  quadratic  term  was  deleted  from  the  model  and  the  straight  line 
model  Y=p+60I+8-]Ix+e  was  fitted  to  the  data.  The  results  are  shown  in 
Figure  XII. 19.  The  linear  term  is  essentially  0  and  is  nonsignificant 
(8^=0.029  with  estimated  standard  error  0.111).  We  thus  conclude  that 
there  is  no  significant  trend  in  reproduction  within  the  range  of  the 
treatment  group  concentrations.  This  is  confirmed  when  we  attempt  to 
estimate  concentrations  resulting  in  specified  reductions  in  average  length 
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relative  to  the  control  group.  As  shown  in  Figure  XII. 20,  the  confidence 
interval  on  the  concentration  associated  with  a  10  percent  reduction  from 
the  control  group  average  ranges  from  0  to  infinity.  It  is  thus 
meaningless. 

Because  of  the  lack  of  trend  among  the  treatment  group  responses,  we 
calculate  an  overall  average  length  within  the  treatment  groups  for 
comparison  with  that  in  the  control  group.  For  the  N=16  beakers  in 
treatments  groups  3-6,  the  mean  and  standard  deviation  of  the  beaker 
averages  are 


Y1=5.023 


a i=0. 14032 


For  the  N=4  beakers  in  the  solvent  control  group,  the  mean  and  standard 
deviation  of  the  beaker  averages  are 


Yq=5.2775 


Sq=0.5852 


Comparing  the  treatment  group  and  control  group  averages  by  means  of  a  (one 
tailed)  two  sample  t-test  we  obtain 


t  = 


Yl" 


[(15^2  +  3S22)/18  (1/16  +  1/4)  ] 1/2 


5.0231  -  5.2775 
0.0732 


=  -3.475 


with  18  degrees  of  freedom.  This  statistic  is  significant  at  a=0.001.  If 
we  had  used  the  mean  square  calculated  in  Subsection  VD,  namely  0.2722/16.7 
=  0.0163  with  98  degrees  of  freedom,  then  t  would  be  equal  to  -3.565  with 
98  degrees  of  freedom.  This  is  significant  at  a =0.0003.  Thus  the  conclu¬ 
sions  are  unchanged  for  all  practical  purposes. 

There  is  the  strong  statistical  evidence  that  the  toxicant  in  Test  A 
diminishes  average  length  in  the  treatment  groups  relative  to  that  in  the 
solvent  control  group.  Whether  the  magnitude  of  decrease  is  of  biological 
importance  is  a  matter  for  biological  judgement.  Since  there  is  no  trend 
in  response  among  the  treatment  groups,  we  conclude  that  the  concentration 
corresponding  to  the  lowest  treatment  group  produces  a  statistically 
significant  reduction  in  reproduction  relative  to  the  solvent  control 
group.  This  concentration  is  0.00290  mg/1. 

Regression  models  fitted  to  these  data  with  the  water  control  group  in 
place  of  the  solvent  control  group  would  yield  similar  results  except  that 
the  treatment  group  responses  do  not  differ  from  the  water  control  group 
response. 
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The  setup  is  the  same  as  that  for  the  reproduction  responses.  Since 
daphnids  are  individually  housed  in  this  test,  there  are  no  complications 
due  to  beaker-to-beaker  heterogenity.  The  results  of  the  analysis  are 
shown  in  Figures  XII. 21  to  XII. 25. 

The  previously  discussed  quadratic  model  was  fitted  to  the  data.  The 
independent  variable,  v,  was  defined  as  v=(x-1 .276)/0.551 .  The  results  of 
the  fit  are  shown  in  Figure  XII. 21  and  the  observed  and  predicted  values 
are  plotted  in  Figure  XII. 22  vs  log-jo  (concentration).  It  is  evident  from 
both  these  displays  that  the  trend  within  the  treatment  groups  is  linear, 
i.e.,  the  quadratic  component  is  essentially  0.  The  quadratic  term  was 
thus  deleted  from  the  model  and  the  straight  line  model  Y=p+B0I+B-)Ix+e  was 
fitted  to  the  data.  The  results  are  shown  in  Figure  XII. 23.  The  linear 
term  is  very  highly  significant,  as  is  evident  from  Figure  XII. 22. 

The  output  from  CONFINT  applied  to  this  model  is  shown  in  Figures 
XII. 24, 25.  We  calculate  point  and  confidence  interval  estimates  of  concen¬ 
trations  corresponding  to  10  percent  and  20  reductions  in  average  21 -day 
length,  relative  to  the  control  group.  Since  the  estimated  average  21 -day 
length  in  the  control  group  is  4.278mm,  the  values  of  c  specified  are 
-0.428  and  -0.856.  Since  we  are  dealing  with  a  straight  line  model,  there 
is  just  one  root.  From  the  bottoms  of  Figure  11.24,25  we  read  these  as 


10  percent  decrease:  conc=17.588  95  percent  conf  interval  (5.10,  60.59) 

20  percent  decrease:  conc=359.205  95  percent  conf  interval  (49.31,  2616.57) 


The  point  estimate  of  the  concentration  corresponding  to  a  20  percent 
reduction  in  length  exceeds  the  highest  treatment  group.  It  thus 
represents  extrapolation  beyond  the  range  of  the  data,  with  the  consequent 
danger  of  extrapolation  bias.  The  confidence  intervals  are  too  wide  to  be 
useful.  The  ranges  of  concentrations  in  these  intervals  span  factors  of  12 
and  53  respectively.  We  must  therefore  conclude  that  the  results  of  this 
test  do  not  provide  enough  imormation  to  precisely  estimate  the  concentra¬ 
tions  associated  with  10  percent  and  with  20  percent  reductions  in  average 
length  relative  to  the  control  group. 


Figure  XII. 2.  Observed  and  predicted  values  from  quadratic  regression  model  of  cumulative 
reproduction  per  surviving  adult  versus  logjg  (concentration).  LeBlanc  test 
water  control  group  -  group  7  deleted. 
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Point  and  confidence  interval  estimates  of  concentration  associated  with  10 
percent  reduction  in  average  reproduction  relative  to  control  group.  LeBlanc 
test  A  -  water  control  group  -  group  7  deleted. 
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Point  and  confidence  interval  estimates  of  concentration  associated  with  20 
percent  reduction  in  average  reproduct  ion  relative  to  control  group.  LeBlanc 
test  A  -  water  control  group  -  group  7  deleted. 


I.eBlanc  test  A  -  solvent  control  group  -  group  7  deleted.  Output 
from  quadratic  regression  of  cumulative  reproduction  per  surviving 
adult  versus  logjg  (concentration). 


Observed  and  predicted  values  from  quadratic  regression  model  of  cumulative 
reproduction  per  surviving  adult  versus  log^g  (concentration).  LeBlanc 
test  A  -  solvent  control  group  -  group  7  deleted.  _ 
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Figure  XLI.7.  Point  and  confidence  interval  estimated  of  concentration  associated  with  10 

percent  reduction  in  average  reproduction  relative  to  control  group.  LeBlanc 
test  A  -  solvent  control  group  -  group  7  deleted. 
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Cliapmun-bery  1 1  ium  test-combined  control  groups.  Output  from  linear  regression 
modi' I  of  eumulative  reproduction  versus  logjo  (concentration) .  Onlv 
surviving  dnphnids  are  i  nr  lulled.  _ 
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Figure  XII. 15.  Point  and  confidence  interval  estimates  of  concentrations  associated  with  10 
percent  reduction  in  average  reproduction  relative  to  control  group  -  Goulden 
isophorone  -  group  6  deleted  -  outlier  in  group  1  deleted. 


Figure  XII. 16.  Point  and  confidence  interval  estimates  of  concentrations  associated  with  20 
percent  reduction  in  average  reproduction  relative  to  the  control  group  - 
Goulden  isophorone  -  group  6  deleted  -  outlier  in  gtoup  1  deleted. 
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Figure  XII. 19.  LeBlanc  test  A  -  solvent  control  group  -  group  7  deleted.  Output  of  linear 
regression  model  of  21  day  lengths  of  surviving  adults  versus  login  (concen 


Figure  XII. 21.  Chapman  beryllium  test  -  combined  control  groups.  Output  from  quadratic 

regression  model  of  21  day  lengths  of  surviving  adults  versus  logto  (concentration). 
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Figure  XII. 23.  Chapman  beryllium  test  -  combined  control  groups.  Output  from  linear 
regression  model  of  21  day  lengths  versus  log1(-)  (concentration).  Only 
surviving  daphnlds  are  included. 
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Point  and  confidence  interval  estimates  of  concentration  associated  wit 
20  percent  reduction  in  average  21  day  length  relative  to  control  group 
Chapman  beryllium  test  -  combined  control  groups. 
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XIII.  CONFIDENCE  INTERVAL  PROCEDURES  FOR  COMPARISON  OF 
EFFECT  LEVELS  BETWEEN  TREATMENT  AND  CONTROL  GROUPS 


A.  INTRODUCTION 


In  Sections  IX-XII  we  considered  the  comparisons  of  effect  levels  in 
the  treatment  groups  with  those  in  the  control  groups  for  survival, 
reproduction,  and  length  responses.  We  utilized  hypothesis  testing, 
multiple  comparisons,  and  dose  response  estimation  procedures.  In  this 
section  we  construct  confidence  intervals  to  quantify  the  extent  of  the 
differences  between  the  treatment  group  and  control  group  effect  levels. 

The  values  contained  within  the  confidence  intervals  indicate  the  extent  of 
biological  significance  of  these  effect  differences.  The  widths  of  the 
confidence  intervals  indicate  the  degree  of  precision  in  the  data  for 
estimating  these  differences.  Narrow  confidence  intervals  signify  precise 
estimates  while  wide  confidence  intervals  signify  imprecise  estimates.  By 
contrast,  hypothesis  testing  procedures  merely  state  whether  the  null 
hypothesis  was  accepted  or  rejected;  they  give  no  indication  of  the  extent 
of  the  effect. 

Confidence  interval  comparisons  of  treatment  group  and  control  group 
average  responses  are  also  a  very  useful  adjunct  to  hypothesis  testing 
procedures,  particularly  at  the  MATC.  After  an  MATC  has  been  determined, 
it  is  worthwhile  to  calculate  a  confidence  interval  there.  The  upper 
confidence  bound  (for  mortality)  or  the  lower  confidence  bound  (for  length 
or  reproduction)  indicate  how  much  worse  than  the  control  group,  the  MATC 
could  conceivably  be.  If  this  confidence  bound  is  biologically  very 
undesirable,  then  the  MATC  might  be  too  high  a  concentration  even  though  it 
is  not  statistically  significantly  different  than  the  control.  Thus  it 
might  make  sense  to  be  more  conservative  and  report  a  concentration  lower 
than  the  MATC.  Such  a  phenomenon  will  occur  most  frequently  if  the 
hypothesis  testing  procedure  has  very  poor  power.  Thus  the  calculation  and 
utilization  of  confidence  intervals  at  the  MATC  can  help  to  alleviate  one 
of  the  principal  weakness  of  the  hypothesis  testing  approach  to  determining 
safe  concentrations. 

Confidence  interval  procedures  are  discussed  in  a  number  of  places. 

Feder  and  Collin3  [1] ,  Section  XIII,  discusses  a  number  of  approaches  for 
placing  confidence  intervals  on  the  ratios  of  treatment  group  mortality  rates 
to  control  group  rates  in  fathead  minnow  early  life  stage  tests.  In  this 
section  we  use  some  of  these  methods,  as  well  as  others,  to  place  cor^idence 
intervals  on  pairwise  treatment-control  ratios  and  differences  in  sur\ival, 
length,  and  reproduction  effects  observed  in  21 -day  chronic  Daphnia  tests. 

We  account  for  possible  beaker-to-beaker  heterogeneity  of  effects  within 
groups  by  the  adjustment  techniques  discussed  in  Section  V. 

We  illustrate  confidence  interval  calculations  using  unsmoothed  effect 
levels  (i.e.,  using  average  observed  effect  levels  within  each  treatment 
group,  unadjusted  for  those  in  the  other  groups)  and  using  smoothed  effect 
levels  (i.e.,  using  effect  levels  based  on  the  predictions  from  regression 


models  fitted  to  all  the  treatment  groups).  We  illustrate  these  approaches 
for  each  of  survival,  length,  and  reproduction. 


B.  MORTAL I TY  — UNSMOOTHED  RESPONSES 


Feder  and  Collins  [1]  discusses  three  approaches  to  the  construction  of 
confidence  intervals  for  mortality  data.  These  are  based  on  large  sample 
normal  theory,  exact  small  sample  theory  based  on  the  noncentral 
conditional  distribution  of  the  2x2  contingency  table,  and  on  Poisson 
theory,  (most  appropriate  when  the  response  probabilities  are  small!.  We 
illustrate  here  the  asymptotic  approach,  the  Poisson  approach,  and  a 
variant  on  the  exact  approach  due  to  Thomas  and  Gart  [30] .  We  illustrate 
these  procedures  with  21 -day  cumulative  mortality  data  from  several  of  the 
examples  considered  previously. 


Asymptotic  Approach 


Let  pc,Pt  denote  the  (population)  mortality  probabilities  in  the 
control  group  and  in  a  treatment  group  respectively  and  let  q=1-p.  Let 
Nc,Nt  denote  the  associated  (effective)  sample  sizes  in  these  groups,  let 
<j>=£n(pt/pc) ,  and  let  Pt,Pc.<t>  denote  estimates  of  these  quantities.  Feder 
and  Collins  [1]  ,  Subsection  XIIIB,  state  that  an  asymptotic  95%  confidence 
interval  on  <j>  is 


jic_  _a.t_ 

1/2 

_£c_  _at_ 

NCPC  +  NtPt 

<  <J>  <,  <t>  +1 .96 

NcPc  +  Ntpt 

This  interval  is  valid  as  NC,N^  ->  °°  with  pc,Pt  fixed. 

In  the  case  of  LeBlanc’s  Test  A  the  effective  sample  sizes  and  observed 
mortality  rates  in  the  solvent  control  group  and  in  treatment  Groups  5  and 
6  are 

N2=N5=59.2,N6=5.8,  p2=2. 2/59. 2=0. 037,  p5=5. 2/59. 2=0. 088,  P6=0.50. 

Substituting  p2 .92 *P5 >95 .06 >§6  for  the  corresponding  parameters  in  the 
confidence  interval  expression  we  obtain  the  following  results: 

Group  5  vs  Solvent  Control:  $ =£n(pij/p2) =£n2. 364 =0.860 

95$  confidence  interval  on  $=(0,860-1.96(0.784),  0.860+1.96(0.784))  = 
(-0.677,2.397) 

95$  confidence  interval  on  P5/p2=(e-®*^^^ ,  e2*397)=(o. 51 ,10.99). 
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Group  6  vs  Solvent  Control:  4>  =  In  13.514=2.604 

95$  confidence  interval  on  $ =(2.604-1 .96(0.589) ,2.604+1 .96(0.589) ) = 
(1.448,3.759) 

95%  confidence  interval  on  pg/P2  =  (e^  •  ^5^)  =(4.26,42.91 ) 

We  conclude  from  these  confidence  interval  calculations  that: 

•  pg  is  significantly  different  from  p2  at  the  5%  level  of 
significance  but  P5  is  not  (since  the  latter  confidence  interval 
includes  1  while  the  former  does  not). 

•  pg  is  substantially  greater  than  p2  (since  the  lower  confidence 
bound  on  the  ratio  is  4.26). 

•  Neither  P5/P2  nor  pg/p2  are  determined  very  precisely.  The  data  are 
compatible  with  pg  being  just  half  of  p2  or  being  ten  times  greater 
than  p2.  The  data  are  compatible  with  pg  being  anywhere  between  4.3 
and  43  times  greater  than  p2.  Thus  these  ratios  are  not  even 
determined  to  an  order  of  magnitude. 

It  is  interesting  to  note  that  these  results  are  compatible  with  those 
obtained  in  Section  IX,  based  on  various  hypothesis  testing  procedures, 
where  it  was  shown  that  Group  5  is  the  MATC. 

With  Goulden's  Isophorone  data  there  was  no  adjustment  of  sample  sizes 
due  to  beaker-to-beaker  heterogeneity.  The  sample  sizes  and  observed 
mortality  rates  among  the  multiply  housed  daphnids  in  the  control  group 
and  in  treatment  groups  4,  5,  and  6  are  Ni=Nj|=N5=Ng=15,  Pi=0.133,P4  = 
0.200,P5=0.533,Pg=0.867.  Thus  the  point  estimates  of  and  asymptotic  95% 
confidence  intervals  on  P4/P1 ,P5/pi ,pg/pi  are: 

Group  4  vs  Control:  pzj/pi=1.50 

95$  confidence  interval  on  pij/pi=(0.29,7.76) 

Group  5  vs  Control:  r>5/pi=4.00 

95%  confidence  interval  on  P5/P1 =( 1 .01 , 15.87) 

Group  6  vs  Control:  pg/pi=6.50 

95$  confidence  interval  on  pg/p-|=(1 .76,24.09) 

We  conclude  from  these  confidence  interval  calculations  that: 

•  pg  and  P5  are  significantly  different  from  pi  at  the  5%  level  of 
significance  but  P4  is  not. 

•  pg  and  P5  may  be  about  the  same  as  pi  or  may  be  an  order  of 
magnitude  greater. 

•  None  of  the  ratios  P4/P1 ,P^/Pi ,  or  pg/pi  are  determined  very 
precisely.  These  ratios  cannot  be  determined  by  these  data  even  to 
an  order  of  magnitude. 


Poisson  Approach 


The  previous  paragraph  discussed  the  construction  of  confidence 
intervals  based  on  asymptotic  normal  theory.  The  normal  approximation  to 
the  binomial  distribution  is  certainly  valid  if  Np>5  and  Nq>5  for  each 
group.  For  several  of  the  groups  however,  the  expected  frequencies  were  as 
low  as  2.  Thus  the  validity  of  the  asymptotic  theory  is  brought  into 
question. 

Another  approximate  approach  to  constructing  confidence  intervals  on 
ratios  of  mortality  probabilities  is  based  on  the  Poisson  approximation  to 
the  binomial  distribution.  This  approximation  is  best  when  both  the 
control  group  and  treatment  group  probabilities  are  small,  about  0.10  or 
less.  The  approximation  holds  for  both  small  and  large  sample  sizes. 

Feder  and  Collins  [1]  ,  Subsection  XIIID,  discuss  this  approach.  Following 
Nelson  [31]  they  state  that  if  XC,NC  and  Xt,Nt  are  the  (adjusted)  responses 
and  sample  sizes  in  the  control  and  treatment  groups  respectively,  if  pc,Pt 
are  the  response  probabilities  in  these  groups,  and  if  Xc=Nepc,Xt=NtPtt 
then  an  approximate  1-a  level  two  sided  confidence  interval  on  At/Ac  is 

Xt  _ ] _ ,  Xfc±l  F(2Xfc+2,2Xc;1-ct2> 

Xc+1  F(2Xc+2,2Xt;1-ai)  Xc 

.  . 

where  F( v -j , ^2 ; Y )  represents  the  upper  Y-th  percentile  of  the  F-distribution 
with  degrees  of  freedom  vj,v2  and  where  ai+tt2=“*  Now 

XtAc  =  (Ntpt)/(Ncpc)  =  (Nt/Nc)(pt/pc) 

Thus  multiplying  the  above  confidence  bounds  by  Nc/N^  yields  confidence 
bounds  on  p^/pc.  Note  that  if  X^=0  the  lower  bound  is  0  while  if  Xc=0  the 
upper  bound  is  infinite. 

Nelson  [31]  presents  charts  which  facilitate  the  construction  of  two 
sided  90%,  95%,  or  99%  confidence  intervals.  These  charts  are  illustrated 
and  their  use  is  discussed  in  Feder  and  Collins. 

We  illustrate  the  Poisson  based  confidence  interval  procedure  with 
several  examples.  First  consider  the  comparison  of  treatment  group  5  with 
the  solvent  control  group  in  LeBlanc's  Test  A.  In  this  case  N2=N5=59.2, 
X2=2.2,  X5=5.2.  Thus  P2.P5  are  both  less  than  0.10.  As  indicated  in  the 
discussion  on  asymptotic  confidence  intervals, 

P5/P2=2.364 

Substituting  X5,X2  for  Xt,Xc  respectively  and  =0.025  in  the  confidence 

interval  expression,  we  calculate  an  approximate  95%  confidence  interval  on 
P5/P2.  Namely 


The  percentiles  of  the  F-distribution  with  nonintegral  degrees  of  freedom 
were  obtained  by  linear  interpolation. 


This  confidence  interval  is  qualitatively  similar  to  but  slightly  wider 
than  the  asymptotic  theory  confidence  interval. 

We  next  consider  the  comparison  of  treatment  Group  4  with  the  control 
group  in  Goulden's  Isophorone  test.  The  observed  mortality  rates  in  these 
groups  are  0.200  and  0.133  respectively  so  we  are  stretching  the  Poisson 
theory  a  bit.  However,  the  errors  made  should  be  on  the  conservative  side 
(i.e.,  overly  long  intervals).  In  this  example  Ni =Ni|=15 ,Xi =2 ,X4=3.  Thus 

P4/PI =1 .50 


Substituting  X4,X-|  for  Xt,Xc  and  a-|  =<*2=0.025  in  the  confidence  internal 
expression,  we  obtain 


3  1  4 

3  F<6,6; .9T5)  2 


F(8,4;0.975) 


df2(8-94) 


(0.17,17.84) 


This  interval  is  very  much  wider  than  the  asymptotic  confidence  interval. 
Thus  the  Poisson  interval  is  not  consistent  with  the  asymptotic  interval  in 
this  example.  We  will  be  able  to  determine  the  relative  merits  of  the 
asymptotic  and  Poisson  confidence  intervals  for  this  example  when  we  con¬ 
sider  below  the  calculation  of  exact  confidence  intervals. 


Exact.  Small  Sample.  Conditional  Approach 


If  the  sample  sizes  are  not  sufficiently  large  to  apply  the  asymptotic 
confidence  interval  procedure  and  if  response  proportions  are  not  suffi¬ 
ciently  small  to  apply  the  Poisson  confidence  interval  procedure,  then 
confidence  interval  comparisons  between  the  treatment  groups  and  control 
group(s)  can  be  made  by  an  exact,  small  sample  procedure  based  on  the 
nonull  distribution  of  Fisher's  exact  test,  conditional  on  the  margins  of  a 
2-by-2  contingency  table. 


Let  PfcjPc  denote  the  response  (mortality)  probabilities  in  the  treatment 
and  control  groups  respectively  and  let  q=l-p.  Feder  and  Collins  [1], 
Subsection  XIIIC,  discuss  procedures  for  placing  exact,  small  sample 
confidence  intervals  on  the  odds  ratio 


oiEcfc 


based  on  an  algorithm  by  Thomas  [32] .  Thomas'  algorithm  has  been 
implemented  in  EXAX2  [2]  .  Thomas  and  Gart  [30]  have  extended  this 
procedure  to  place  exact  confidence  intervals  on  differences  and  ratios  of 
response  probabilities,  based  on  the  confidence  interval  on  the  odds  ratio. 
They  present  tables  of  such  confidence  intervals  for  a  wide  variety  of 
possible  outcomes  of  2-by-2  contingency  tables. 


We  illustrate  the  small  sample,  conditional  approach  with  several 
examples  based  on  Goulden’s  Isophorone  test.  The  odds  ratios  and 
associated  95$  confidence  intervals  are  calculated  by  EXAX2.  Results  for 
pairwise  comparisons  of  Groups  4 ,  5 ,  and  6  with  the  control  group  are  shown 
in  Figure  XIII. 1. 

Group  4  vs  Control:  p =0.615 

95$  confidence  interval  on  p  is  (0.045,6.484)=(pl,pu) 

Let  xl,xu  denote  the  numbers  of  dead  control  animals  that  would 
correspond  to  odds  ratios  Pl>Pu  conditional  on  the  marginals  of  the 
table.  Following  Thomas  and  Gart  we  have  (for  m=5 ,N-|  =Nij=15) 


lM5-(5-xl)  u  15 -xu  '  15-(5-x0) 

This  results  in  the  quadratic  equations 
(Pl-1)Xl2_(i0+20pl)xl+75Pl=0 
(Py-1 )xu2-(10+20Pu)xu+75Pu=0 

Substituting  the  values  of  dl.pU»  solving  the  quadratic  equations,  and 
retaining  the  roots  that  lie  between  0  and  5  yields 

xl=0.3008 

xu=4.l6l4 

These  values  yield  confidence  bounds  on  pij/pi  by  noting  that  p^/pi  is 
estimated  by  ( 5  — x) / x.  Thus,  ((5-xy)/xu, (5-xl)/xl)  constitutes  an  exact 
small  sample  1-a  level  two  sided  confidence  interval  on  P4/P1.  This 
yields  the  interval  (0.20,15.62).  This  interval  is  a  bit  shorter  than 
the  Poisson  based  interval,  but  is  very  similar.  It  appears  that  the 
asymptotic  interval  is  too  short  in  this  example.  However  the 
asymptotic,  Poisson,  and  conditional  intervals  all  lead  to  the  same 
qualitative  conclusions  that  there  is  no  statistical  evidence  that  p-| 
and  P4  differ  and  that  the  ratio  P4/P1  cannot  be  determined  very 
precisely  based  on  just  the  pairwise  treatment-control  comparison. 

Group  5  vs  Control:  p =0.135 

95$  confidence  interval  on  p  is  (0.0118,  1 .0012)  =  (pl*pU^ 

Let  xl,  xu  have  the  3ame  interpretations  as  in  the  comparison  of  Group 
4  with  the  control.  Using  the  same  procedure  as  discussed  there,  we 
obtain  the  quadratic  equations 

( p  L-1 ) xL2 - ( 5+25P  l)xl+150p l=0 

(Pu~1  )xu^-(5+25P  u)xU+^50Pu=0 


This  yields  xl=0.3168,  xu=5.0019.  Since  pg/p-j  is  estimated  by  (10— 
x)/x,  we  obtain  the  95$  two  sided  confidence  interval  on  P5/P1  to  be 

(10-xu)/xu,(10-xl)/xl,)=(0.999,30.57).  The  lower  end  point  of  this 

interval  is  in  good  agreement  with  the  asymptotic  interval  but  the 
upper  endpoint  is  much  greater.  However,  both  intervals  lead  to  the 
same  qualitative  conclusions,  namely  that  P5  is  statistically 
significantly  greater  than  p-]  (at  the  one  sided  0.025  level)  and  the 
ratio  P5/P1  cannot  be  determined  very  precisely  based  on  the  pairwise 
treatment-control  comparison. 

Group  6  vs  Control:  p =0.024 

95$  confidence  interval  on  p  is  (0.0018,0.2511)=  (pl»pU^ 

Using  the  same  relations  discussed  previously,  xl  and  xy  are  determined 
as  the  roots  of  the  quadratic  equations 

(Pl-1)xl2-30Plxl+225Pl=0 
(P  u~  1 )  xu2-30p  (JXU+225P  u=0 

that  lie  between  0  and  15.  This  yields  xl=0.6063,  xu=5.0076.  Since 
pg/pi  is  estimated  by  (15-x)/x,  we  obtain  the  95$  two  sided  confidence 
interval  on  P6/P1  to  be  (( 15-xu)/xu, (15-xL) /x^) )=( 1 .995 ,23.74) .  This 
interval  is  in  rather  good  agreement  with  that  based  on  asymptotic 
theory.  This  confidence  interval  shows  that  there  is  strong 
statistical  evidence  that  P6  is  greater  than  p-| ;  however  the  ratio 
pg/p-j  cannot  be  determined  very  precisely  based  on  the  pairwise 
treatment-control  comparison. 

We  have  discussed  a  procedure  for  constructing  exact,  small  sample 
confidence  intervals  on  the  ratios  of  mortality  probabilities  and  have 
compared  the  results  from  this  procedure  to  confidence  intervals  based  on 
Poisson  or  on  asymptotic  theory.  The  asymptotic  confidence  intervals 
appear  to  be  too  short  when  the  group  sizes  are  as  small  as  those  in  the 
Goulden  te3t  (N=15)  and  the  mortality  probabilities  are  relatively  small. 
However,  all  the  confidence  interval  procedures  yielded  the  same 
qualitative  conclusions.  All  the  confidence  intervals  constructed 
indicated  that  ratios  of  mortality  probabilities  could  not  be  determined 
very  precisely  based  on  unadjusted  pairwise  treatment-control  comparisons 
with  the  numbers  of  daphnids  per  group  and  the  magnitudes  of  mortality 
probabilities  encountered  in  these  tests.  Note  that  we  have  not  adjusted 
for  simultaneity.  Such  adjustments  can  be  easily  carried  out  using 
Bonferroni's  method. 

The  preceding  approach  can  also  be  easily  adapted  to  constructing 
exact,  small  sample  conditional  confidence  intervals  on  the  differences 
between  the  treatment  group  mortality  rates  and  the  control  group  rate. 
Again,  following  the  notation  in  Thomas  and  Gart  [30]  we  let  N^,NC  denote 
the  sample  sizes  in  the  treatment  and  control  groups  respectively  and 
Xg,  Xc  denote  the  number  of  dead  animals  in  these  groups.  Conditional  on 
Xt+Xc=m,  the  difference  between  the  proportions  dead  in  the  treatment  group 
and  in  the  control  group  is  estimated  by 


Let  xl, Xu  denote  the  numbers  of  dead  control  animals  that  would  correspond 
to  the  lower  and  upper  95$  confidence  limits  Pl,Pu  on  the  odds  ratio.  Then 

ftn-xn  _  xu.  m-XT.  -  2Ll\ 

V  Nt  Nc  Nt  HqJ 

constitutes  an  exact,  small  sample  1-a  level  two  sided  confidence  interval 
on  PfPc*  Substituting  the  values  of  Nc,Nf.,  m,XL,xu  appropriate  for  the 
treatment  group-control  group  comparisons  in  Goulden's  Isophorone  Test,  we 
obtain: 

Group  4  vs  Control (m=5,Ni=N4=15,XL=0.3008,xu=4. 16 14) :  95$  confidence 

interval  on  P4-P1  is  (-0.22,0.29)  and  the  point  estimate  is  0.067. 

Thus  there  is  no  statistical  evidence  that  p-|  and  pij  differ. 

Group  5  vs  Control  (m=10 ,N^ =N5=15 ,xl =0.3168 ,xu=5. 0019) :  95$  confidence 

interval  on  P5-P1 ,  is  (-0.0003,0.62)  and  the  point  estimate  is  0.40. 
Thus  P5  is  significantly  greater  than  pi  at  the  0.025  level,  but  we 
cannot  determine  the  difference  very  precisely. 

Group  6  vs  Control  (m=15 ,Ni =N6=15 ,xl=0 .6063 ,xy=5 .0076) :  95$  confidence 

interval  on  P6~Pi  is  (0.33,0.92)  and  the  point  estimate  is  0.73.  Thus 
pg  is  significantly  greater  than  pi ,  but  the  difference  cannot  be 
determined  very  precisely. 


C.  MORTALITY— SMOOTHED  RESPONSES 


The  confidence  intervals  in  the  previous  subsection  were  based  on 
comparisons  of  the  actually  observed  response  rates  in  the  treatment  and 
control  groups,  without  imposing  any  structure  on  the  mortality  probabil¬ 
ities.  However  a  number  of  assumptions  about  the  behavior  of  these 
probabilities  with  increasing  concentration  may  be  quite  reasonable.  It 
was  seen  in  the  preliminary  plots  in  Section  II  that  the  mortality  rates 
generally  increase  with  increasing  concentration,  they  generally  increase 
in  a  smooth  manner,  and  the  trend  curves  are  generally  S-shaped.  This  is 
typical  behavior  that  is  observed  for  such  responses,  with  many  different 
compounds  and  many  different  animal  species. 

Such  structure  can  be  accounted  for  in  the  construction  of  confidence 
intervals  on  treatment  group-control  group  differences.  One  way  to  do  this 
is  to  fit  regression  models  to  describe  the  trends  in  mortality  rates  and 
base  confidence  interval  calculations  on  predictions  from  these  models. 

One  commonly  used  model  is  the  probit  model.  Probit  models  in  concentra¬ 
tion  or  in  log  concentration  were  fitted  to  the  mortality  data  in  Section 
X.  In  this  section  we  use  the  results  from  these  probit  fits  to  construct 
confidence  intervals  on  pairwise  differences  between  the  treatment  group 
and  control  group  mortality  rates.  Such  confidence  intervals  would  be 


357 


M 


I 


expected  to  be  more  precise  than  the  intervals  constructed  in  the  previous 
subsections.  However  they  are  based  on  a  greater  number  of  assumptions.  A 
detailed  comparison  of  the  theoretical  properties  of  the  confidence 
intervals  based  on  the  smoothed  probability  estimates  with  those  based  on 
the  unsmoothed  estimates  might  be  carried  out,  but  is  beyond  the  scope  of 
this  report. 


It  should  be  noted  that  the  inferences  discussed  in  this  subsection  are 
based  on  asymptotic  theory.  Since  the  predictions  of  the  regression  models 
are  based  on  averaging  responses  from  all  the  groups  in  various  ways,  it 
might  be  expected  that  the  asymptotic  theory  is  more  valid  for  the  smoothed 
estimates  than  for  the  unsmoothed  estimates.  This  too  needs  to  be  investi¬ 
gated  in  greater  detail. 


The  confidence  intervals  in  this  subsection  were  constructed  to  compare 
differences  between  treatment  group  and  control  group  mortality  rates. 
Directly  analogous  procedures  can  be  used  to  construct  confidence  intervals 
on  the  ratios  of  these  rates. 


The  confidence  intervals  are  based  on  the  standard  three-parameter 
probit  models  (in  either  logarithmic  or  untransformed  concentration  levels) 
that  were  fitted  to  the  data  in  Section  X.  These  models  can  be  expressed 


p ( cone ) =p0+ ( 1 -p0 ) $ ( 8  o+6 1 ( z-m ) ) 


where  p0,p(conc)  are  the  background  mortality  rate  and  the  rate  at  concen¬ 
tration  cone  respectively,  z  is  either  concentration  or  log-j q (concentra¬ 
tion)  ,  m  is  a  fixed  centering  constant,  $(’)  is  the  standard  normal  c.d.f. 
and  Po,8o,0i  are  unknown  parameters  to  be  estimated  from  the  model  fit  to 
the  data.  The  confidence  interval  calculations  are  based  on  the  estimated 
parameters  from  this  model  and  their  estimated  variances  and  covariances. 
The  details  of  the  procedure  are  contained  in  Appendix  AXIII.  We  apply  the 
procedure  below  to  the  21 -day  mortality  responses  from  several  of  the  data 
sets. 


LeBlanc  Test  A — Solvent  Control  Group — Logarithmic  Concentration 


The  control  group  is  Group  2.  The  differences  and  their  estimated 
standard  errors  are 

Group  3  4  567 

Difference  0.0000*  0.0000*  0.004  0.419  0.905 

Std  Error  0.0000*  0.0000*  0.0131  0.2025  0.0224 

Thus  95$  confidence  intervals  on  the  treatment  group-control  group 
differences  are 

Group  3  vs  Control  (0.0000,0.0000) 

Group  4  vs  Control  (0.0000,0.0000) 

Group  5  vs  Control  (-0.021,0.030) 

Group  6  vs  Control  (0.022,0.816) 

Group  7  vs  Control  (0.861,0.949) 

There  is  thus  statistical  evidence  that  Groups  6  and  7  have  greater 
mortality  rates  than  the  control  group.  However  the  extent  of  these 
differences  cannot  be  well  determined.  Note  that  the  Group  6  vs  Control 
confidence  interval  on  P5-P2  above  gives  a  somewhat  different  impression  of 
the  relation  between  these  probabilities  than  the  asymptotic  confidence 
interval  on  pg/p2  in  Subsection  B.  The  interval  in  this  subsection  sug¬ 
gests  less  information  about  the  extent  of  difference  between  pg  and  P2 
than  does  the  interval  in  Subsection  B.  Most  of  the  discrepancy  is  due  to 
the  fact  that  the  smoothed  background  mortality  estimate  is  0.083  while  the 
unsmoothed  estimate  is  0.038.  The  remainder  of  the  discrepancy  is  probably 
due  to  slightly  different  assumptions  built  into  the  asymptotic  distribu¬ 
tions  that  were  used.  While  both  intervals  lead  to  the  same  qualitative 
differences,  the  implications  of  the  interpretational  differences  of  the 
lower  endpoints  need  to  be  further  investigated.  Subjectively,  I  would 
prefer  the  interval  in  this  subsection  since  it  uses  more  information  from 
the  data. 


LeBlanc  Test  B — Combined  Control  Groups — Untransformed  Concentration 


Here  z=conc,  m=0.0250.  From  Figure  X.8  we  obtain 

po=0. 10944  60=-3. 37287  =15.9695 

o(po)=0. 020227  a(§0)=2. 80322  5 (B y) =12 .9349 

/ 1.0000  -0.5474  0.592l\ 

R= (-0.5474  1.0000  -0.9868] 

\0.5291  -0.9868  1.0000/ 

The  slope  and  intercept  are  not  well  determined  in  this  example.  Their 
estimates  are  highly  intercorrelated.  This  is  undoubtedly  due  to  the 
nearly  constant  dose  response  relation,  except  for  Group  7. 


*  These  values  are  zero  to  at  least  four  decimal  places. 


The  control  group  is  Groups  1  and  2  combined.  The  differences  and 
their  estimated  standard  errors  are 

Group  34567 

Differs. .^e  0.0001  0.0003  0.0017  0.0096  0.5022 

Std  Error  0.0011  0.0031  0.0124  0.0489  0.1637 

Except  for  Group  7,  the  treatment  group-control  group  differences  are 
negligible.  95?  confidence  intervals  on  the  treatment  group  control  group 
differences  are 

Group  3  vs  Control  (-0.002,0.002) 

Group  4  vs  Control  (-0.006,0.006) 

Group  5  vs  Control  (-0.023,0.026) 

Group  6  vs  Control  (-0.086,0.105) 

Group  7  vs  Control  (0.181,0.823) 

There  is  thus  statistical  evidence  that  Group  7  has  greater  mortality 
rates  then  the  control  group.  However,  the  extent  of  this  difference 
cannot  be  well  determined. 


Goulden  Isophorone — Logarithmic  Concentration 


Here  z=logig(conc) ,  m=1.8352.  From  Figure  X.11  we  obtain 


po=0. 09058 

B0=-2. 50889 

B-i  =7.50132 

o(po)=0. 04258 

a(B0) 

=0.95596 

a(§1)=2. 51776 

A  f  1 . uUOO 

-0.3238 

0.2590\ 

R=  -0.3238 

1.0000 

-0.9633 

\0.2590 

-0.9633 

1 .0000/ 

The 

control  group 

is  Group 

1.  The  differences  and  their 

estimated  standard 

errors  are 

Group 

2 

3 

4  5 

6 

Difference 

0.0000 

0.0002 

0.0924  0.4722 

0.7619 

Std  Error 

0.0000 

0.0009 

0.0932  0.1051 

0.0884 

95% 

confidence  intervals  on 

the  treatment  group-control 

group  differences 

are 


Group  2  vs 
Group  3  vs 
Group  4  vs 
Group  5  vs 
Group  6  vs 


Control  (negligible) 
Control  (-0.002,0.002) 
Control  (-0.090,0.275) 
Control  (0.266,0.678) 
Control  (0.589,0.935) 


There  is  statistical  evidence  that  Groups  5  and  6  have  greater  mortality 
rates  than  the  control  group.  However  the  extent  of  these  differences 
cannot  be  well  determined. 


It  is  interesting  to  compare  these  intervals  with  those  obtained  in 
Subsection  B  based  on  exact,  small  sample  theory  using  the  unsmoothed 
mortality  estimates.  Direct  comparisons  are  available  for  Groups  4,  5,  and 
6  in  Goulden's  Isophorone  Test. 


Group  4  vs  Control 
Group  5  vs  Control 
Group  6  vs  Control 


Exact,  Small  Sample 
Confidence  Intervals, 
Unsmoothed  Mortality  Estimates 

(-0.22, 0.29), 0.067 
(-0.0003,0.62) ,0.40 
(0.33, 0.92), 0.73 


Asymptotic  Confidence 
Interval,  Probit  Based 
Mortality  Estimates 

(-0.090, 0.275), 0.092 
(0.266,0.678) ,0.47 
(0.589, 0.935), 0.76 


The  smoothed  and  unsmoothed  point  estimates  are  similar.  However  the 
asymptotic  confidence  intervals  are  much  shorter  than  the  small  sample 
confidence  intervals.  The  discrepancies  are  particularly  at  the  lower 
endpoints  of  these  intervals.  The  reason  for  good  agreement  of  the  upper 
confidence  bounds  but  poor  agreement  of  the  lower  confidence  bounds  is  not 
well  understood  and  should  be  studied  further. 


D.  LENGTH— UNSMOOTHED  RESPONSES 


Subsections  B  and  C  were  concerned  with  various  approaches  to  construc¬ 
ting  confidence  intervals  for  the  comparisons  of  treatment  group  and 
control  group  mortality  rates.  Confidence  intervals  can  also  be 
constructed  to  compare  length  and  reproduction  responses  between  treatment 
and  control  groups.  Since  the  procedures  for  length  and  for  reproduction 
are  quite  similar,  we  consider  lengths  only.  In  this  subsection  we  work 
with  unsmoothed  lengths  and  in  the  next  subsection  we  work  with  lengths 
smoothed  by  regression  models.  Recall  that  in  several  data  sets  we  deleted 
groups  from  the  length  and  reproduction  comparisons  because  of  excessive 
mortality.  This  includes  Group  7  in  LeBlanc's  Test  A  and  Group  6  in 
Goulden's  Isophorone  test. 


Nonsimultaneous  confidence  intervals  are  based  on  the  t-distribution. 
Adjustments  for  simultaneity  can  be  based  on  Dunnett's  procedure  in  the 
balanced  case  (i.e.,  equal  numbers  of  beakers  per  group  and  equal  numbers 
of  daphnids  per  beaker)  or  more  generally  on  Bonferroni's  procedure.  We 
will  not  pursue  either  procedure  here.  We  consider  several  examples. 


LeBlanc  Test  A — Solvent  Control  Group — Group  7  Deleted 


We  use  the  average  lengths  and  estimated  standard  errors  contained  in 
Subsection  XI. C.  The  standard  error  estimates  are  based  on  the  interaction 
mean  square  in  the  analysis  of  variance  fit  discussed  in  Subsections  IV-A 
and  V-D.  The  average  lengths  and  estimated  standard  errors  are  (see 
Subsection  V-D  for  details) 


Group 
Average 
Std  Error 


Control 

2  3  4 

5.2753  4.9743  5.1027 

0.059  0.062  0.061 


5  6 

4.9658  5.0000 

0.061  0.080 


We  associate  98  degrees  of  freedom  with  the  interaction  mean  square  (after 
augmenting  information  from  the  within  beaker  mean  square). 


The  1-a  level  two  sided  nonsimultaneous  confidence  intervals  are  Y^- 
Y0±t0 where  Y^,Y0  are  the  average  lengths,  where  tQ  is  the  upper 
<1/ 2  point  of  the  t-distribution  with  98  degrees  of  freedom,  and  o^,o0  are 
the  estimated  standard  errors  of  the  treatment  and  control  group  averages 
respectively.  If  1-ct=0.95,  the  t  factor  is  1.99.  Thus  the  confidence 
intervals  are 


Group  3  vs  Control  (-0.471,-0.131) 

Group  4  vs  Control  (-0.341,-0.004) 

Group  5  vs  Control  (-0.478,-0.141) 

Group  6  vs  Control  (-0.473,-0.077) 

Since  none  of  these  intervals  contain  0,  they  provide  statistical 
evidence  that  the  average  21 -day  length  in  the  solvent  control  group  is 
significantly  greater  than  those  in  the  treatment  groups.  The  average 
differences  are  between  a  tenth  and  a  half  of  a  millimeter.  Whether  or  not 
such  reductions  in  lengths  are  of  biological  significance  is  a  separate 
issue.  This  result  is  in  direct  agreement  with  the  appearance  of  Figure 
11.26  and  with  the  multiple  comparisons  calculations  carried  out  in 
Subsection  XI. C. 


Chapman — Beryllium — Combined  Control  Groups 


Analyses  are  carried  out  only  on  lengths  corresponding  to  daphnids  that 
survived  to  the  end  of  the  test.  Since  the  test  was  carried  out  with  just 
one  daphnid  per  beaker,  there  is  no  complication  due  to  beaker-to-beaker 
variation  within  groups.  This  component  of  variation  is  confounded  with 
the  daphnid-to-daphnid  variation. 


The  estimated  standard  deviation,  based  on  the  residual  sum  of  squares 
from  a  one-way  analysis  of  variance  (not  shown)  is  o=0.325  with  56  degrees 
of  freedom.  The  average  lengths  per  group,  numbers  of  daphnids,  and 
standard  errors  are 


Control 
Group  1  and  2 

Number  1 8 

Average  4.278 
Std  Error  0.077 


3  4  5 

8  6  6 

4.125  4.000  3.817 

0.115  0.133  0.133 


6  7  8 

9  10  6 

3.900  3.590  3.650 

0.108  0.103  0.133 


Nonsimultaneous  95%  confidence  intervals  are  calculated  just  like  in  the 
LeBlanc  Test  A  example.  These  confidence  intervals  are 


Group  3  vs  Control 
Group  4  vs  Control 
Group  5  vs  Control 
Group  6  vs  Control 
Group  7  vs  Control 
Group  8  vs  Control 


(-0.429,0.124) 

(-0.585,0.029) 

(-0.768,-0.154) 

(-0.643,-0.113) 

(-0.945,-0.431) 

(-0.935,-0.321) 


These  intervals  provide  statistical  evidence  that  the  average  lengths  in 
Groups  5-8  are  statistically  significantly  lower  than  those  in  the  control 
group.  This  is  in  reasonable  agreement  with  the  appearance  of  Figure 
11.32. 


E.  LENGTH— SMOOTHED  RESPONSES 


The  confidence  intervals  in  the  previous  subsection  were  based  on 
comparisons  of  the  actually  observed  average  lengths  within  each  group, 
without  imposing  any  structure  on  these  averages.  Regression  models  were 
fitted  to  the  lengths  in  Subsection  XII. C  that  assumed  smooth  trends  in 
average  lengths  with  increasing  concentrations.  It  is  interesting  to  note 
that  these  trends  are  not  necessarily  monotone.  That  is,  low  toxicant 
concentrations  sometimes  enhance  average  lengths. 

The  confidence  intervals  in  this  subsection  were  constructed  to  compare 
differences  between  treatment  group  and  control  group  average  lengths. 

They  are  based  on  polynomial  regression  models  (linear  or  quadratic)  in 
logi q( concentration) .  Since  the  control  group  is  generally  far  removed 
from  the  treatment  groups  in  terms  of  log-j q( concentration ) ,  it  was  felt 
advisable  to  fit  the  polynomials  only  to  the  treatment  group  responses. 

The  control  group  responses  are  unadjusted.  The  specific  form  of  the 
models  fitted  is  discussed  in  detail  in  Subsection  XII. A.  See  that 
discussion  for  further  details.  We  utilize  the  results  of  these  models  in 
the  calculations  below. 

We  illustrate  the  construction  of  confidence  intervals  on  treatment 
group-control  group  average  differences  with  several  examples. 


LeBlanc  Test  A — Solvent  Control  Group--Group  7  Deleted 


There  are  four  treatment  groups  and  a  control  group.  A  quadratic 
regression  model  in  logic (concentration)  was  t0  the  treatment  group 

responses.  The  results  are  shown  in  Figure  XII. 17.  The  quadratic  term  in 
Figure  XII. 17  has  the  wrong  sign  (positive),  is  small,  and  is  nonsignif¬ 
icant.  The  plot  of  length  vs  group  in  Figure  11.26  shows  no  trend  among 
the  treatment  groups.  The  quadratic  term  was  deleted  from  the  model  and  a 
straight  line  trend  was  fitted  to  the  treatment  groups.  The  results  are 
shown  in  Figure  XII. 19.  The  slope  coefficient  is  essentially  zero  and  is 
nonsignificant.  Based  on  this  and  the  appearance  of  Figure  11.26,  we 
conclude  that  there  is  no  trend  in  average  lengths  among  the  four  treatment 
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groups.  We  thus  calculate  an  overall  average  length  across  the  treatment 
groups  for  comparison  with  that  in  the  control  group.  The  average  length 
among  the  256  surviving  daphnids  in  the  4  treatment  groups  is  5.0125  mm 
with  a  standard  error  (based  on  the  overall  interaction  mean  square)  of 
0.0326.  (The  average  of  the  16  beaker  averages  is  5.0231  with  a  standard 
error  of  .035,  based  on  these  16  values.)  The  average  length  among  the  77 
surviving  daphnids  in  the  solvent  control  group  is  5.2753  mm  with  a 
standard  error  (based  on  the  overall  interaction  mean  square)  of  0.059. 

(The  average  of  the  four  beaker  averages  is  5.2775  mm  with  a  standard  error 
of  0.029,  based  on  those  4  values.  The  averages  are  similar  but  the 
standard  errors  differ.  We  use  the  larger  value  below,  but  this  should  be 
considered  further.) 

A  95$  confidence  interval  on  the  difference  of  the  treatment  group  and 
solvent  control  group  average  lengths  is 

(5.0125-5.2753)+2(.03262+.0592)1/2=(-0.4 0,  -0.13) 

This  interval  is  tighter  than  the  individual  confidence  intervals  calcu¬ 
lated  in  the  previous  subsection  and  it  summarizes  the  trend  information 
more  succinctly.  We  feel  that  this  is  a  better  representation  of  the 
results. 


Chapman — Beryllium — Combined  Control  Groups 


There  are  six  treatment  groups  and  a  (combined)  control  group. 

Analyses  are  carried  out  only  on  lengths  corresponding  to  daphnids  that 
survived  to  the  end  of  the  test.  A  quadratic  trend  model  in  log-|Q( con¬ 
centration)  was  fitted  to  the  treatment  group  responses.  The  results  of 
the  fit  are  shown  in  Figure  XII.21.  It  is  evident  from  Figures  XII. 21  and 
XII. 22  that  the  quadratic  trend  is  negligible.  The  quadratic  term  was  thus 
deleted  from  the  model  and  a  straieht  line  model  was  fitted  to  the  data.  The 
results  of  this  fit  are  shown  in  Figure  XII. 23.  The  linear  term  is  highly 
significant  and  we  use  this  model  to  construct  confidence  intervals. 

The  parameterization  of  the  regression  model  is  discussed  in  detail  in 
Section  XII.  The  portion,  B0+8ix,  of  the  model  represents  the  difference 
between  the  predicted  average  length  at  x  (=logio(concentrat:i-on) )  and  the 
average  length  in  the  control  group.  This  is  estimated  for  the  i-th 
treatment  group  as  S0+8-|Xi  where  00 , 8 1  are  the  I  and  IX  coefficients  in 
Figure  XII. 23  and  xt  is  0.4150,  0.7931,  1.1377,  1.4294,  1.7370,  2.0456  for 
treatment  Groups  3,  4,  5,  6,  7,  8  respectively.  The  standard  error  of 
Bq+Bix^  is  estimated  as 

ai=[(i,xi)i(i,xin1/2 

where  f;  is  the  estimated  variance-covariance  matrix  given  at  the  bottom  of 
Figure  XII. 23.  A  1-a  two  sided  nonsimultaneous  confidence  interval  on  the 
treatment-control  difference  is  Bo+BiXi+ta^,  where  t  is  the  upper  a/2 
percentile  of  the  distribution  with  the  appropriate  number  of  degrees  of 
freedom.  In  our  case  a=0.05,  d.f.=56,  t=2.00.  The  confidence  intervals  on 
the  differences  for  the  individual  treatment  groups  are: 


Group  3  vs  Control  (-0.391,0.078) 

Group  4  vs  Control  (-0.478,-0.082) 

Group  5  vs  Control  (-0.573,-0.212) 

Group  6  vs  Control  (-0.699,-0.307) 

Group  7  vs  Control  (-0.785,-0.392) 

Group  8  vs  Control  (-0.913,-0.465) 

These  intervals  are  qualitatively  similar  to  the  unsmoothed  intervals 
calculated  in  the  previous  subsection.  However  they  are  somewhat  shorter, 
since  they  incorporate  information  from  all  the  groups.  Groups  5-8  are 
statistically  significantly  lower  than  the  control  group.  Group  4  is  seen 
to  be  borderline  significant  on  the  basis  of  either  the  smoothed  or 
unsmoothed  intervals. 


XIV.  EXPERIMENTAL  DESIGN  CONSIDERATIONS 


In  this  section  we  consider  a  number  of  aspects  of  the  design  of 
chronic  Daphnia  toxicity  tests.  Topics  discussed  include  limitations  on 
test  size,  precision  to  be  expected  as  a  function  of  test  size,  allocation 
of  daphnids  among  beakers  and  beakers  among  groups,  criteria  for  judging 
the  adequacy  of  a  test,  and  blocking  considerations.  Feder  and  Collins 
[1]  ,  Section  XIII  discuss  some  experimental  design  issues  related  to  early 
life  stage  tests  with  Fathead  Minnows.  Portions  of  the  discussion  in  this 
section  extend  that  material  and  adapt  it  to  Daphnia  tests. 


A.  INTRODUCTION 


The  principal  limitations  on  the  size  of  a  toxicity  test  are  the  amount 
of  dilution  water  that  can  be  supplied  by  the  particular  proportional 
diluter  being  used,  the  number  of  replicate  beakers  per  group  into  which 
this  water  can  practicably  be  split  in  a  uniform  manner,  and  the  maximum 
number  of  daphnids  that  can  be  accommodated  in  the  available  amount  of 
water  and  that  can  be  monitored  for  survival,  reproduction,  and  length  with 
the  available  amount  of  laboratory  personnel  and  technology.  The  extent  of 
these  limitations  depends  on  the  available  equipment  and  people  resources 
and  this  will  vary  considerably  among  test  facilities.  Thus  no  absolutes 
can  be  stated  about  universal  numbers  of  daphnids,  numbers  of  beakers, 
etc.,  that  need  be  used.  However,  we  can  estimate  the  precision  to  be 
expected  as  a  function  of  test  size  and  we  can  state  some  general 
guidelines  for  allocating  daphnids  among  beakers  and  allocating  beakers 
among  groups. 


B.  ALLOCATION  OF  DAPHNIDS  AMONG  BEAKERS  WITHIN  GROUPS 


Experimental  equipment,  facilities,  and  technique  place  upper  bounds  on 
the  numbers  of  beakers  that  can  be  used  in  each  treatment  or  control  group. 
Water  supply  and  technician  availability  place  upper  limits  on  the  number 
of  Daphnia  that  can  be  used.  We  recommend  that  as  many  daphnids  as 
feasible  be  used  for  studying  toxicant  effects  on  survival.  Furthermore  as 
many  beakers  as  possible  should  be  used  within  each  test  group,  so  that  the 
numbers  of  daphnids  per  beaker  can  be  made  as  small  as  possible.  However 
equal  numbers  of  daphnids  should  be  allocated  to  each  beaker  within  each 
test  group,  so  as  to  keep  effects  of  competition,  food  and  oxygen  supply, 
contagion,  and  handling  as  constant  as  possible  across  beakers. 

Utilizing  larger  numbers  of  replicate  beakers  per  group  with  smaller 
numbers  of  daphnids  per  beaker  has  many  advantages.  Some  of  these  are: 

1 .  There  will  be  less  competition  among  the  daphnids  within  each 
beaker.  This  should  result  in  healthier  daphnids,  should  reduce 
nontoxicant  related  mortality  and  morbidity,  and  should  thereby 
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improve  sensitivity  for  determining  toxicant  related  effects  on 
survival,  growth,  and  reproduction. 

2.  There  will  be  less  contagion  among  the  daphnids  within  each  beaker. 
Thus  fungii,  bacteria,  etc.,  that  can  invade  a  beaker  will  affect 
fewer  daphnids.  Similarly  any  fluctuations  in  experimental 
conditicrts  in  the  beaker  will  affect  fewer  daphnids. 

3.  Estimates  of  effects  within  groups  are  based  on  averages  of  the 
observed  average  effects  within  each  beaker.  The  presence  of 
beaker-to-beaker  heterogeneity  degrades  the  precision  of  such 
averages.  However,  averaging  over  replicate  beakers  tend  to 
balance  out  the  beaker-to-beaker  heterogeneity.  Another  way  of 
saying  this  is  that  for  a  given  number  of  daphnids  on  test  per 
group  and  for  a  given  degree  of  beaker-to-beaker  heterogeneity  per 
group,  the  more  beakers  and  the  fewer  daphnids  per  beaker  that  are 
used,  the  more  precise  will  be  the  estimates.  In  the  terminology 
used  in  previous  sections,  the  effective  sample  size  per  group 
increases  as  the  number  of  beakers  increase. 

4.  The  number  of  degrees  of  freedom  for  estimating  the  extent  of 
beaker-to-beaker  variation  within  groups  increases  with  the  number 
of  beakers.  Thus  not  only  is  the  effective  sample  size  and 
therefore  the  precision  increased,  but  the  ability  to  estimate  that 
precision  is  improved.  Feder  and  Collins  [1] ,  Section  XVIII 
recommend  that  at  least  12  degrees  of  freedom  be  available  for 
estimating  variability. 

The  principal  limitations  on  the  numbers  of  beakers  that  can  be  used 
per  group  are  the  ability  to  deliver  uniform  water  quantity  and  quality  to 
large  numbers  of  beakers  and  to  maintain  the  beakers  under  uniform 
laboratory  test  conditions.  These  problems  increase  as  the  size  of  the 
test  increases. 

We  now  perform  calculations  that  illustrate  how  effective  sample  sizes 
per  group  depend  on  the  allocation  of  beakers  and  daphnids  within  each 
group.  We  illustrate  such  calculations  for  survival,  length,  and 
reproduction  responses. 

Suppose  that  a  particular  test  (or  control)  group  contains  J  beakers 
and  n  daphnids  per  beaker.  The  total  sample  size  is  N  e  Jn.  We  calculate 
effective  sample  sizes  as  J  (and  therefore  n)  varies  with  N  fixed. 

Effective  sample  size  calculations  for  determing  survival  rates  are  based 
on  Williams’  beta  binominal  model  [3]  while  effective  sample  sizes  for 
determining  average  lengths  or  reproduction  are  based  on  components  of 
variance  calculations. 

We  first  consider  survival.  Following  the  discussion  in  Feder  and  Collins 
[1] ,  Subsection  XVIIIB  we  let  X^j  denote  the  number  of  dead  daphnids  (e.g., 
after  21  days)  in  beaker  j  of  group  i.  For  notational  convenience  we 
suppress  the  subscript  i  in  the  discussion  below.  Thus  X^j  is  referred  to 
as  Xj.  We  assume  Xj  is  binominally  distributed  with  parameters  (n,  pj)  and 
pj  in  turn  has  a  beta  distribution  with  parameters  (a,  6).  This  model 
allows  for  random  variation  of  mortality  rates  among  the  beakers  within 
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TABLE  XIV. 1  EFFECTIVE  SAMPLE  SIZE  PER  GROUP  AS  A  FUNCTION  OF  NUMBER  OF  DAPHNIDS  (N) 
NUMBER  OF  BEAKERS  (J),  AND  DEGREE  OF  BEAKER-TO-BEAKER  HETEROGENEITY  (9) 
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each  group  and  thereby  quantifies  the  extent  of  beaker-to-beaker 
heterogeneity. 

Let  u=a/(ot+B)  0=1/(a+3)  0<u<1 ,0<_9<°° 

Then 

E(Pj)EU  Var(pj)=p(1-u)e/(1+0) 

The  unconditional  distribution  of  Xj,  accounting  for  the  random  variation 
of  pj,  is  beta  binominal  with  mean  and  variance 

E(Xj)  =np  Var(Xj)=nu(l-p)]jp 

Thus 

ps^Xi/N  E(p)  =  u  Var(p)  =  hLLiH}  Un6 

J  J  N  1+0 

In  Subsection  VB  we  defined  the  variance  inflation  factor  as 

Var(p) 

"lu(1-u)/Nl 

and  the  effective  sample  size  as 
Neff  =  N/K 

Thus,  under  the  assumptions  of  the  beta  binomial  model, 

K=1±H§  and  Meff=  1+6. 

1+0  1+ne 

The  parameter  9  quantifies  the  extent  of  beaker-to-beaker  heterogeneity 
within  groups.  As  0  approaches  0,  there  is  less  and  less  beaker-to-beaker 
heterogeneity  and  so  Neff  approaches  N,  the  number  of  daphnids.  As  6 
approaches  infinity,  there  is  more  and  more  beaker-to-beaker  heterogeneity 
and  so  Neff  approaches  J,  the  number  of  beakers.  The  general  situation  is 
a  compromise  between  these  two  extremes.  We  also  see  that  for  fixed  N  and 
fixed  9,  Neff  increases  as  n  decreases  (i.e.,  as  J  increases).  We  tabulate 
values  of  Neff  in  Table  XIV. 1,  corresponding  to  various  combinations  of  N, 
J,  and  0.  Table  XIV. 1  shows  that  depending  on  the  specific  combinations  of 
N,  J,9the  effective  sample  size  can  be  anywhere  between  J  and  N.  This  has 
important  implications  on  the  sensitivity  of  the  test.  The  added  precision 
due  to  having  additional  beakers  is  most  pronounced  when  the  extent  of 
beaker-to-beaker  heterogeneity  is  relatively  large  and  the  number  of 
beakers  per  group  is  relatively  small. 

What  values  of  9  arise  on  daphnia  toxicity  tests?  A  partial  answer  to 
this  question  can  be  obtained  by  reference  to  several  of  the  data  sets 
analyzed  in  previous  sections.  No  evidence  of  beaker-to-beaker 


heterogeneity  was  present  in  either  Adams'  selenium  test  or  in  Goulden's 
isophorone  test.  However,  the  relatively  small  group  sizes  in  these  tests 
(J=3,  n=5)  did  not  permit  sensitive  determinations  of  such  heterogeneity. 
LeBlanc's  Tests  A  and  B  had  larger  group  sizes  (J*4,  n=20).  Both  of  these 
tests  showed  strong  statistical  evidence  of  beaker-to-beaker  heterogeneity 
within  groups  (see  Subsection  IIX-B).  It  was  estimated  in  Subsection  V-B 
that  the  variance  inflation  factors  are: 

LeBlanc  Test  A  K  =  1.35  for  Groups  1-5,  7 

K  =  13.76  for  Group  6 

LeBlanc  Test  B  K  =  1.39  for  Groups  1-6 

K  =  8.91  for  Group  7 

Using  the  relation  K  =  (1+n9)/(1+0)  and  n=20,  we  obtain  the  following 
estimates  of  e. 

LeBlanc  Test  A  0  =  0.02  for  Groups  1-5,  7 

e  =  2.04  for  Group  6 

LeBlanc  Test  B  0  :  0.02  for  Groups  1-6 

0  =  0.71  for  Group  7 

Referring  to  the  portion  of  Table  XIV. 1  corresponding  to  N=75  and  0=0.02, 
we  see  that  Neff  can  vary  a  moderate  amount  as  J  varies  between  3  and  25. 
The  effective  sample  size  at  J=25  is  about  40  percent  greater  than  that  at 
J=3,  based  of  course  on  the  same  number  of  daphnids  tested.  The  situation 
for  0=0.75  or  for  9=2.00  is  much  more  extreme.  The  effective  sample  size 
varies  between  6.7  and  40.4  for  0=0.75  and  between  4.4  and  32.1  for  9=2.0 
as  J  varies  between  3  and  25.  Thus  having  had  allocations  with  more 
beakers  and  with  fewer  daphnids  per  beaker  would  have  greatly  increased  the 
effective  sample  sizes  in  Group  6  of  Test  A  and  in  Group  7  of  Test  B. 


C.  SAMPLE  SIZE  AND  POWER  CONSIDERATIONS  FOR  QUALITATIVE  SURVIVAL  DATA 


In  the  previous  subsection  we  calculated  effective  group  sample  sizes 
as  a  function  of  number  of  beakers  per  group,  number  of  daphnids  per 
beaker,  and  extent  of  beaker-to-beaker  heterogeneity.  In  this  subsection 
we  calculate  the  power  to  be  expected  for  pairwise  comparisons  between 
treatment  groups  and  the  control  group.  These  power  calculations  are  based 
on  one  sided,  two  sample  hypothesis  tests  without  any  smoothing  of  the 
treatment  group  responses  by  means  of  fitting  regression  models.  Such 
smoothing  would  undoubtedly  improve  the  power  of  treatment  group-control 
group  comparisons. 

The  power  calculations  in  this  subsecton  are  an  extension  of  those  in 
Feder  and  Collins  [1] ,  Subsection  XVIII. B.  They  are  based  on  adjusting  the 
sample  sizes  within  each  group  to  effective  sample  sizes  and  then  carrying 
out  comparisons  across  groups  based  on  per  daphnid  analyses.  The 
calculations  are  an  extension  of  those  in  Feder  and  Collins,  but  the  sample 


sizes  are  modified  to  be  more  applicable  to  Daphnia  tests.  Feder  and 
Collins  examined  the  consequences  of  allocating  greater  numbers  of  daphnids 
to  the  control  group  than  to  each  of  the  treatment  groups  since  the  control 
group  enters  into  each  pairwise  comparison  while  each  treatment  group 
enters  into  just  one.  Thus  extra  allocation  to  the  control  group  should 
improve  the  resulting  power.  The  calculations  in  Feder  and  Collins  show 
that  the  power  is  indeed  improved;  however  the  extent  of  improvement  is  not 
sufficiently  great  to  warrant  the  additional  logistical  and  administrative 
effort  involved.  Thus  the  power  calculations  below  are  based  on  the 
assumption  of  equal  allocations  of  daphnids  in  each  group  (treatment  or 
control).  Adjustments  for  simultaneous  testing  are  ignored  in  the  power 
calculations. 

The  power  calculations  are  based  on  the  hypothesis  test  of  H0:p=p0  vs 
H1:p>p0  where  p,pD  corresponds  to  the  average  mortality  rates  in  the  treat¬ 
ment  and  control  groups  respectively.  We  adjust  the  effective  sample  sizes 
down  to  Neff  in  each  group  to  account  for  beaker-to-beaker  heterogeneity. 

We  assume  that  N,J,0  are  the  same  in  each  group  so  that  Neff  is  also  the 
same  across  groups.  We  estimate  p,pQ  by  the  sample  mortality  rates  p,f>0, 
we  carry  out  the  variance  stabilizing  transformations  2  arc  sin  /p  and  2 
arc  sin  >/p^,  and  we  reject  H0  at  significance  level  ot=0.05  if 

2  arc  sin  ^  -2  arc  sin  >  1 .645^2/Ngf  f 

The  power  of  this  test  is  calculated  for  various  combinatons  of  Neff,  p,pc 
that  are  appropriate  for  Daphnia  tests.  Based  on  the  calculations  in  Table 
XIV. 1,  Ngff  varies  between  1  and  100  as  N,J,0  vary  over  different  combina¬ 
tions  of  values.  It  is  assumed  that  an  infinite  number  of  degrees  of 
freedom  are  available  for  error  estimation. 

The  observed  control  group  mortality  rates  in  LeBlanc's,  Adams', 
Chapman's,  and  Goulden's  (combined)  control  groups  were  0.09,  0.13,  0.07, 
0.10,  and  0.13  respectively.  We  thus  assume  a  range  of  control  group 
mortality  rates  between  0.05  and  0.15.  The  calculations  are  shown  in  Table 
XIV. 2. 

If  we  (somewhat  arbitrarily)  regard  0.80  as  a  reasonable  level  of 
power,  we  see  that  tests  conducted  under  ASTM  guidelines  (i.e.,  3  beakers 
per  group  to  test  survival,  5  daphnids  per  group)  are  sensitive  for 
distinguishing  between  10$  and  50$  mortality  but  not  for  distinguishing 
between  10$  and  30$  mortality,  even  if  there  is  no  beaker-to-beaker 
heterogeneity.  Any  beaker-to-beaker  heterogeneity  would  degrade  the 
sensitivity  of  the  test  since  it  would  reduce  the  effective  sample  sizes. 
For  example,  for  9=0.71  (the  estimated  heterogeneity  of  Group  7  of  LeBlanc 
Test  B) ,  Table  XIV. 1  shows  that  the  effective  group  size  is  degraded  from 
15  to  about  6.  Table  XIV. 2  shows  that  with  a  group  size  of  6,  the  test 
cannot  come  close  to  distinguishing  between  10$  and  50$  mortality  rates. 
Thus  even  without  taking  heterogeneity  into  account,  the  ASTM  guidelines 
appear  to  be  too  minimal  for  many  reasonable  survival  rate  comparisons  of 
biological  importance. 


Note  that  ASTM  guidelines  suggest  running  7  additional  beakers  per 
group,  with  just  one  daphnid  per  beaker.  These  additional  daphnids  are 
intended  for  assessing  changes  in  productivity  with  increasing  concentra¬ 
tions.  We  recommend  that  these  seven  individually  housed  daphnids  per 
group  should  not  be  combined  with  the  15  multiply  housed  daphnids  per  group 
for  assessing  mortality  rates.  The  mortality  rates  will  in  general  differ 
for  singly  and  for  multiply  housed  daphnids.  This  was  seen  to  be  the  case 
in  Goulden's  data. 

Chapman's  beryllium  test  was  conducted  with  10  beakers  per  group  and 
one  daphnid  per  beaker.  The  intent  of  this  design  was  to  obtain  good 
information  about  reproduction.  Table  XIV. 2  shows  that  this  test  was  too 
small  to  provide  good  sensitivity  for  inferences  concerning  mortality. 

(Note  that  the  test  was  not  designed  for  this  purpose.) 

LeBlanc's  tests  were  somewhat  larger.  They  consisted  of  N=80  daphnids 
per  group,  divided  among  J=4  beakers.  In  most  of  the  groups  the  degree  of 
beaker-to-beaker  heterogeneity  was  small  (i.e.,  0=0.02).  Table  XIV. 1  shows 
that  the  effective  sample  size  per  group  is  reduced  to  about  Neff=60. 

Table  XIV. 2  shows  that  with  effective  group  sizes  of  60,  we  can  expect  to 
distinguish  reasonably  precisely  between  10$  and  30$  mortality.  However  in 
Group  6  of  Test  A  and  in  Group  7  of  Test  B  the  degree  of  beaker-to-beaker 
heterogeneity  was  somewhat  greater.  The  estimated  values  of  9  for  these 
groups  are  2.0  and  0.7  respectively.  Table  XIV. 1  shows  that  with  N=80 
daphnids  and  J=4  beakers  per  group,  the  effective  sample  sizes  are 
approximately  6  and  9  respectively.  If  this  degree  of  beaker-to-beaker 
heterogeneity  had  been  present  in  all  the  groups  then  Table  XIV. 2  shows 
that  we  could  not  expect  to  distinguish  well  even  between  mortality  rates 
of  10$  and  50$. 

We  thus  see  that  the  sensitivity  of  a  test  depends  heavily  on  the 
extent  of  beaker-to-beaker  heterogeneity  within  groups  as  well  as  on  the 
number  of  daphnids  per  group.  In  turn,  the  effects  of  beaker-to-beaker 
heterogeneity  depend  on  the  number  of  beakers  per  group.  The  larger  the 
number  of  beakers  and  the  smaller  the  number  of  daphnids  per  beaker,  the 
less  will  be  the  decrease  in  precision. 


D.  UNEQUAL  ALLOCATION  OF  TESTING  EFFORT  AMONG  TREATMENT  GROUPS 


Feder  and  Collins  [1] ,  Subsection  XVIIIE  suggested  that  under  certain 
circumstances  it  might  be  sensible  to  have  an  asymmetric  allocation  of 
beakers  and  daphnids  to  the  various  experimental  groups.  In  particular 
they  state  "...if  on  the  basis  of  either  a  priori  scientific  information  or 
previous  testing  some  information  was  available  concerning  mortality  rates 
to  be  expected  at  the  various  treatment  groups,  then  unequal  allocation  of 
experimental  effort  would  be  preferable.  In  particular  at  the  higher 
treatment  groups,  where  mortality  would  be  expected  to  be  substantially 
higher  than  the  control  rate,  it  is  easy  to  detect  differences  from  the 
control.  Thus  the  experimental  effort  should  be  decreased  at  these  groups. 
At  the  lower  experimental  groups,  where  it  is  more  difficult  to  detect 


differences  from  the  control  group,  the  experimental  effort  should  be 
increased  to  improve  sensitivity.  Thus  the  degree  of  experimental  effort 
should  in  general  decrease  as  the  toxicant  level  increases..." 

We  discuss  here  an  approach  for  arriving  at  an  unequal  allocation. 

This  procedure  leads  to  equal  allocations  across  groups  when  there  is  a 
priori  total  ignorance  about  response  levels  to  be  expected.  The  greater 
the  degree  of  prior  information  or  beliefs  about  expected  response  levels, 
the  more  asymmetric  will  be  the  allocation. 

The  use  of  prior  knowledge  or  information  in  designing  tests  seems 
quite  sensible.  With  all  the  accumulated  experience  in  the  aquatic 
toxicology  literature,  there  is  no  need  to  act  as  if  each  test  being  run  is 
the  first  test  ever.  Using  equal  allocations  of  beakers  within  groups  is 
essentially  saying  that. 

We  now  discuss  the  details  of  a  procedure  for  arriving  at  unequal 
allocations.  We  illustrate  the  procedure  with  a  hypothetical  example,  but 
one  bearing  similarities  to  several  of  the  data  sets  we  have  studied. 

Suppose  there  is  a  control  group  and  I  treatment  groups  (denoted  as 
Group  0  and  Groups  1  to  I  respectively).  Let  p0,Pi,...,Pi  denote  the 
average  mortality  rates  in  these  groups.  Based  on  prior  knowledge, 
information,  or  belief  we  can  place  bounds  on  these  rates.  Namely 
£o~Po-uo>  ^2-P2-u2 >  •  •  •  i  £l-Pl-uI*  Tlie  £'s  and  u'3  are  specified 

quantities.  Total  ignorance  would  correspond  to  £0=£-|  =. . . -I i=0,  u0=ui  =  ...= 
uj=1.  We  wish  to  test  the  hypotheses  Ho:p^=p0  against  the  one  sided 
alternative  H1  :pppQ.  We  stipulate  that  it  is  important  to  reject  H0 
whenever  p^  is  A  or  more  above  the  control  group  mortality  rate  (e.g., 

A=.10  or  A=.20,  etc.). 

We  further  assume  that  cost,  experimental,  and  logistical  constraints 
have  placed  limits  on  the  total  number  of  beakers  and  the  total  number  of 
daphnids  to  be  used  in  the  test  as  well  as  the  numbers  of  daphnids  to  be 
placed  in  each  beaker.  Each  beaker  throughout  the  test  (that  is  used  for 
determining  survival  rates)  contains  the  same  number  of  daphnids,  n.  We 
assume  that  the  extent  of  beaker-to-beaker  heterogeneity  is  constant  across 
groups.  Thus  neff,  the  effective  sample  size  within  beakers,  is  also 
constant  across  groups.  We  allocate  the  available  beakers  across  groups. 

In  the  course  of  determining  the  numbers  of  beakers  per  group  we  do  not 
take  into  account  limitations  imposed  by  the  proportional  diluter  appa¬ 
ratus.  If  a  suggested  group  allocation  exceeds  the  maximum  number  of 
beakers  that  the  diluter  can  handle  in  one  group  then  use  the  maximum 
number  possible  in  that  group  and  reallocate  the  remaining  beakers  by  the 
procedure  discussed  below. 


For  purposes  of  planning  the  allocation  of  experimental  effort  among 
treatment  groups,  make  the  conservative  assumption  that  pQ=uo>  Pl=^l> 

pj=£j.  We  calculate  the  sample  sizes  necessary  to  attain  specified 
power  when  p^=max  (u0+A,£.£)  i=l,...,I.  If  £^<u0+A  for  all  i,  then  an 
equal  allocation  plan  is  called  for.  The  tighter  the  bounds  around  the 
Pi's  are,  the  more  unequal  will  be  the  allocation  scheme. 

Suppose  type  1  error  level  a  and  power  1-g  are  desired.  Let  60=2 
arc  sin  [u0]l/2  and  0^=2  arc  sin  [max  (u0+A ) ] ^/2  i=l,...,i.  Let  nQ, 
n^,...,  nj  denote  the  number  of  daphnids  in  each  group  and  assume  for 
discussion  purposes  that  there  is  not  any  beaker- to-beaker  variation. 

Let  Zi_a,  denote  the  upper  1-a  and  upper  1-g  percentiles  of  the 

standard  normal  distribution  respectively.  Then  the  sample  sizes  must 
satisfy  the  relation 


1  1  -  <91  ~  eo>2 

H  no  (^1-a  ^i-g)2 


Consider  a  hypothetical  example.  Suppose  1=5,  A=.10,  a=.05,  l-8=.90 
Then  Z_q5=1.645,  2.90=1.282,  and 


0  <  p0  <  .05  H  u0 

l\  ■-  0  <  Pl  <  .10 

12  =  . 10  <  p2  <  .20 

13  =  .20  <  p3  <  .30 

£4  =  -A0  <  p4  <  .70 

l5  =  .80  <  p5  <  1.00 

The  group  sample  sizes  n^  satisfy 

l/ni  +  l/nQ  =  (.80-. 45)2/(1.645+l. 

l/n2  +  l/nQ 

l/n3  +  l/nQ 

l/n4  +  l/n0 

1/05  +  l/nQ 


uQ  =  .05  0o=-45 

max(u0+A,£.i)=.  15  0i=.8O 

max(u0+A,£2)=- 15  02=.8O 

max(uo+A,£3)=.20  63=. 93 

max(uo+A,£4)=.40  04=1.37 

max(u0+A ,£3 )= . 80  90=2.21 


the  relations 

)2  =  .0143 
=  .0143 
=  .0269 
=  .0988 
=  .3616. 


Thus  nQ  must  be  at  least  70.  The  treatment  allocations  that  satisfy 
the  above  relations  and  that  approximately  minimize  the  total  numbers 
of  daphnids  required  are: 


n0=175,  n3=117,  n2=117,  03=48,  n4=ll,  05=2. 


If  a  beaker  contains  25  daphnids,  this  suggests  having  7  control  beakers 
and  5,5,2, 1,1  beakers  allocated  to  the  treatment  groups  (going  from  low 
to  high  concentrations). 


SPECIAL  TOPICS.  ANALYSIS  OF  TIME  TRENDS  IN 
PRODUCTIVITY.  ANALYSIS  OF  TIME  TO  DEATH 


A.  INTRODUCTION 


This  section  discusses  several  specialized  topics.  The  analyses 
discussed  here  provide  information  concerning  aspects  of  the  data  that 
were  not  considered  in  the  previous  sections. 

Subsection  B  is  concerned  with  the  repeated  measures  analysis  of 
time  trends  in  productivity  data.  The  analyses  in  Sections  XI  and  XII 
pertain  to  the  total  numbers  of  offspring  from  the  surviving  daphnids 
and  the  variation  in  these  total  numbers  among  concentration  groups. 

The  analyses  in  this  subsection  consider  the  time  patterns  according 
to  which  these  offspring  were  produced  and  the  variation  of  these  time 
patterns  among  concentration  groups. 

Subsection  C  is  concerned  with  the  analysis  of  time  to  death  and 
the  variation  of  the  distributions  of  time  to  death  among  concentration 
groups.  The  analyses  in  Sections  IX  and  X  pertain  to  the  numbers  of 
surviving  daphnids  at  various  points  in  time.  They  do  not  utilize 
information  about  time  to  death.  Such  information  increases  the  sen¬ 
sitivity  of  comparisons.  The  use  of  both  parametric  and  nonparametri c 
models  is  illustrated. 


B.  REPEATED  MEASURES  ANALYSIS  OF  TIME  TRENDS  IN  PRODUCTIVITY 


Goulden's  data  on  isophorone  are  displayed  in  Figure  1.8.  The 
numbers  of  live  offspring  corresponding  to  days  2,  4,  6,  8,  11,  13,  15, 
18,  and  21  are  shown  for  each  group.  Since  there  were  no  offspring  and 
just  one  surviving  daphnid  in  group  6,  that  group  is  deleted  from  com¬ 
parisons  of  time  trends  in  productivity.  The  data  corresponding  to  the 
daphnid  in  group  2,  beaker  l  are  deleted  from  the  comparisons  because 
that  daphnid  died  while  on  test.  There  were  no  offspring  observed  on 
days  2,4,  and  6  and  just  two  daphnids  produced  offspring  on  day  8. 

Thus  the  comparisons  of  time  trends  in  productivity  were  restricted  to 
days  11  to  21,  except  for  the  two  daphnids  that  produced  offspring  on 
day  8;  the  day  8  values  were  used  for  these  daphnids. 

The  analyses  of  time  trends  in  productivity  require  special  analysis 
techniques  since  the  numbers  of  offspring  produced  by  the  same  daphnid 
at  different  points  in  time  would  be  expected  to  be  correlated.  That  is, 
some  daphnids  might  produce  relatively  large  numbers  of  offspring  at 
each  time  point  while  others  might  produce  relatively  few  offspring  at 
each  time  point.  Such  data,  consisting  of  responses  over  time  for  each 
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subject,  are  often  referred  to  as  repeated  measures  data  or  longitudinal 
data.  A  large  number  of  techniques,  of  varying  degrees  of  generality  and 
complexity,  are  available  for  analyzing  such  data.  The  methods  discussed 
in  this  section  correspond  to  a  relatively  simple  model  and  are  readily 
appl ied. 

Figure  XV. 1  displays  the  results  of  a  repeated  measures  analysis  of 
variance  on  the  numbers  of  offspring  produced  on  days  11,  13,  15,  18,  and 
21  by  the  surviving  daphnids  in  groups  1  to  5  in  the  Goulden  test  on  iso- 
phorone.  The  display  was  prepared  with  program  P2V  in  the  BMDP  statistical 
computing  system. 

There  are  six  panels  in  the  analysis  of  variance  table.  The  first 
panel  corresponds  to  an  analysis  of  variance  on  the  average  productivity 
of  each  daphnid  across  all  five  days.  The  three  lines--MEAN,  GROUP,  and 
ERROR--correspond  to  a  standard  one  way  analysis  of  variance  with  five 
groups.  This  analysis  of  variance  table  is  analagous  to  that  discussed 
for  the  Goulden  data  in  Section  XI. B.  The  discussion  there  pertains  to 
totals  rather  than  averages  and  one  observation  is  deleted  form  Group  1, 
but  otherwise  the  results  are  analagous.  Daphnid  to  daphnid  random 
variability  would  be  expected  to  be  reflected  in  the  average  values  across 
time  and  thus  in  the  error  sum  of  squares. 

The  second  to  fifth  panels  correspond  to  analyses  of  variance  of 
various  contrasts  in  the  numbers  of  offspring  across  time.  The  notations 
N(l),  N(2),  N(3),  N (4 )  correspond  to  the  linear,  quadratic,  cubic,  and 
quartic  orthogonal  polynomial  components  of  the  time  trends  in  offspring. 
The  top  line  in  each  panel  coresponds  to  the  overall  effect  across  treat¬ 
ment  groups,  the  second  line  corresponds  to  the  interaction  with  groups, 
and  the  third  line  is  the  error  term  corresponding  to  that  component. 

If  the  first  mean  square  (e.g.,  N(l))  is  statistically  significant,  this 
provides  evidence  of  a  significant  overall  polynomial  trend  component 
(e.g.,  linear  component).  If  the  second  mean  square  (e.g.,  N(l)G)  is 
statistically  significant,  this  provides  evidence  that  the  values  of 
the  polynomial  component  vary  from  group  to  group.  The  manner  in  which 
they  vary  across  groups  needs  to  be  studied  further. 

The  sums  of  squares  in  the  bottom  panel  are  the  sums  of  the  corres¬ 
ponding  sums  of  squares  in  the  second  to  fifth  panels.  This  panel 
represents  a  combined  test  of  the  presence  of  an  overall  time  trend  in 
numbers  of  offspring  (NYNG  effect)  and  the  interaction  of  this  time  trend 
with  group  (NG  effect).  The  pooling  of  the  sums  of  squares  for  the  ortho¬ 
gonal  polynomial  components  assumes  that  these  components  are  statisti¬ 
cally  independent  and  have  the  same  variance.  Two  adjustments  to  the 
error  degrees  of  freedom  in  the  bottom  panel  account  for  departures 
from  this  assumption.  The  Greenhouse-Geisser  and  Huynh-Feldt  adjustment 
factors  are  given  at  the  bottom  of  the  figure  and  their  effects  on  the 
significance  levels  of  the  overall  tests  are  shown  to  the  right  of  the 
bottom  panel . 


The  results  in  Figure  XV. 1  show  very  strong  statistical  evidence  of 
group  to  group  differences  in  average  numbers  of  offspring  on  days  11  to 
21.  This  is  in  direct  correspondence  with  the  results  shown  in 
Section  XI. B.  There  are  significant  overall  linear,  quadratic,  and  cubic 
time  trend  components  as  well  as  a  significant  interaction  between  the 
linear  component  of  time  trend  and  concentration  group. 

Since  the  results  in  the  second  to  sixth  panels  are  based  on  con¬ 
trasts  across  time,  at  least  a  portion  of  the  daphnid  to  daphnid  random 
variability  would  be  expected  to  be  eliminated  from  these  components. 

Thus  two  error  terms  (denoted  as  1  and  2  in  the  left  most  column  of 
Figure  XV. 1)  are  shown.  The  first  is  used  for  comparing  average  values 
across  time  and  the  second  is  used  for  comparing  contrasts  within  daphnids. 

The  analysis  of  variance  calculations  show  the  presence  of  time 
trends  and  the  presence  of  group  to  group  differences  in  the  productivity 
responses;  however  they  do  not  show  the  nature  of  these  trends  and  dif¬ 
ferences.  Additional  analyses  were  carried  out  to  characterize  the 
nature  of  the  differences  and  trends.  For  each  daphnid,  average  responses 
over  time  and  orthogonal  polynomial  contrasts  were  calculated.  Let  Ps> 

Pi  1 ,  P-13,  Pq 5 ,  P]8«  and  Ppi  denote  the  numbers  of  offspring  produced  by 
a  daphmd  on  days  8,  11,  13,  15,  18,  and  21.  respectively.  For  those 
daphnids  with  zero  productivity  on  day  8  ■ ue  orthogonal  polynomial 

contrasts  were  defined  as 

AVG  •  (P„+P,3*P,5«VP21>/51/2 
LIN  =  (-2PirP13+P18+2P21)/101/2 
QUADR  =  (2PirP13-2P15-P18+2P21)/141/2 
CUBIC  =  (-P11+2P13-2P18+P21)/101/2 


The  quartic  contrast  was  not  calculated  because  of  its  nonsignificance 
in  the  analysis  of  variance  calculations  in  Figure  XV. 1.  Contrasts 
analagous  to  those  above  (but  including  Ps)  were  calculated  for  the  two 
daphnids  that  had  offspring  on  day  8. 

Histograms,  summary  statistics,  and  analysis  of  variance  comparisons 
among  groups  are  displayed  for  the  AVG,  LIN,  QUADR,  and  CUBIC  components 
in  Figures  XV. 2  to  XV. 5,  respectively.  These  displays  were  prepared  with 
program  P7D  in  the  BMDP  statistical  computing  system. 

Figure  XV. 2  shows  a  quadratic  like  trend  with  concentration  for  the 
average  numbers  of  offspring  across  time  per  daphnid.  This  trend  is 
directly  related  to  that  displayed  in  Figure  XII. 14. 


The  overall  mean  values  of  the  LIN,  QUADR,  and  CUBIC  components 
across  all  groups  combined  are  substantially  greater  than  their  standard 
errors  (see  lower  left  portions  of  Figures  XV. 3,  XV. 4,  XV. 5).  This 
reflects  the  significant  linear,  quadratic,  and  cubic  main  effects  shown 
in  Figure  XV. 1 . 

Figures  XV. 3  and  XV. 5  show  significant  group  to  group  differences  in 
the  linear  and  cubic  components,  respectively.  Figure  XV. 4  shows  a  sig¬ 
nificant  difference  among  groups  when  the  possibility  of  unequal  variances 
is  taken  into  account  (Welch  statistic)  but  not  otherwise.  The  histograms 
in  these  three  figures  suggest  linear  trends  with  concentration  or  log 
concentration  for  each  of  the  orthogonal  polynomial  components.  The 
trend  slopes  are  positive  for  the  linear  and  quadratic  components  and 
negative  for  the  cubic  component. 

Based  on  the  results  shown  in  Figures  XV. 1  to  XV. 5  polynomial  regres¬ 
sion  functions  were  fitted  to  describe  the  trends  in  productivity  across 
time  and  across  concentration  groups.  As  shown  in  Figure  XV. 1,  separate 
error  estimates  are  appropriate  for  analyzing  the  average  (or  total) 
productivity  across  time  for  each  daphnid  and  for  analyzing  the  contrasts 
that  represent  the  time  trends  in  productivity  for  each  daphnid.  These 
separate  error  estimates  are  referred  to  there  as  components  1  and  2, 
respectively. 

Let__Pj  denote  the  number  of  offspring  for  a  daphnid  on  the  i-th  day 
and  let  P  denote  the  average  number  of  offspring  for  that  daphnid  across 
all  days.  Then 

Pi=P+(PrP)  • 

We  fit  separate  response  curves  to  P  and  to  Pi-P. 

The  dose  response  curve  fit  to  P  is  directly  analagous  to  the  dose 
response  curves  that  were  fitted  in  Section  XII  to  the  total  productiv¬ 
ity  responses.  In  particular  refer  to  Section  XII .B and  Figures  XII. 13, 
XII. 14  for  the  fit  to  the  Goulden  data.  Thus  we  need  not  repeat  this 
discussion  here. 

The  response  curve  fit  to  the  P-j-P  values  reflects  the  trends  in 
productivity  both  across  time  and  across  concentration  groups.  Although 
the  P-j-P  values  within  a  daphnid  are  slightly  negatively  correlated, 
we  treat  these  values  as  approximately  independent  for  the  purpose  of 
the  analysis  illustrated  below.  This  approximation  can  be  refined  by 
using  generalized  least  squares  or  multivariate  analysis  techniques. 

Let  C  denote  concentration.  Let  D  denote  day  and  let  D.Sd  denote 
the  mean  and  standard  deviation  of  D.  Define 


1. 

• 

ri 

APi  =  P.  -  P 

X  =  1og10(l+C) 

1 

d  =  (D-D)/sd 

r 

As  suggested  by  the  significance  of  the  polynomial  trend  components 
in  Figure  XV. 1  and  by  the  natures  of  the  concentration  related  trends 
displayed  in  Figures  XV. 3  to  XV. 5,  we  fit  the  following  regression  model 
to  the  trend  data: 

LA 

APijk  =  eo+3ldi+B2Xj+e3diXj+64di+B5diXj+e6di+B7diXj+eijk 

► 

► 

► 

The  indices  i,  j,  k  correspond  to  day,  concentration,  and  replicate 
number,  respectively.  The  term  eijk  corresponds  to  the  error  term, 
which  is  assumed  to  be  approximately  independent  with  constant  variance. 

- 

s 

The  results  of  this  fit  are  shown  in  Table  XV. 6.  All  the  estimated 
coefficients  except  those  for  the  Xj  and  dfXj  terms  (X,IDY2X)  are  sig¬ 
nificant.  The  multiple  R-square  isjust  0 . 2 l .  Thus  there  is  definite 
statistical  evidence  of  the  presence  of  time  trends  in  the  production 
of  offspring  and  of  the  variation  of  these  time  trends  with  concentration; 
however  most  of  the  variation  in  numbers  of  offspring  is  random  variation 
from  daphnid  to  daphnid  and  is  not  explained  by  this  systematic  trend 
model . 

— 

d 

We  note  from  the  signs  of  the  regression  coefficients  that  the 
concentration  related  trend  in  the  linear  time  trend  term  (IDYX)  is 
positive,  that  in  the  quadratic  time  trend  term  (IDY2X)  is  positive, 
and  that  in  the  cubic  time  trend  term  (IDY3X)  is  negative.  This  agrees 
with  the  concentration  related  trends  observed  in  Figures  XV. 3  to  XV. 5. 

'  •  ' 

• 

► 

• 

Various  inferences  concerning  the  time  trends  in  productivity  ana 
their  relation  to  concentration  can  be  based  on  the  fitted  regression 
model.  For  example  it  might  be  of  interest  to  estimate  the  day  associ¬ 
ated  with  maximum  productivity  and  its  relation  to  concentration.  The 
first  time  derivative  of  the  regression  function  is  zero  at  the  time  of 
maximum  productivity  and  the  second  time  derivative  is  negative. _  Let 

Dtnax  denote  the  time  of  maximum  productivity.  Let  dmax  =  (Dmax*D)/sd- 
Thus  dmax  satisfies  the  relations 

3<66+f,7X)dmL  *  2<<VB5X>dmax  +  <61+63X>  =  0 

i 

6(e6+e7x)dmax  +  2(e4*e5x)  <  o 

: 

►  , 
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Substitution  of  the  estimated  regression  coefficients  shown  in  Figure  XV. 
into  these  relations,  along  with  the  values  of  X  corresponding  to  the 
concentrations  C,  results  in  the  following  estimates  of  Dmax. 


Concentration,  C  Dmax  (days) 


0 

12.7 

10 

12.8 

50 

13.2 

100 

14.0 

150 

15.7 

The  estimated  times  of  maximum  productivity  are  seen  to  increase  with 
increasing  concentration. 

Approximate  standard  errors  for  these  estimates  can  be  calculated 
based  on  the  variance-covariance  matrix  of  the  regression  coefficients 
(not  shown)  and  the  delta  method.  The  1983  versions  of  the  P3R  and  PAR 
nonlinear  regression  programs  in  the  BMDP  statistical  computing  system 
can  directly  calculate  estimates  of  such  nonlinear  functions  of  the 
regression  coefficients,  along  with  associated  standard  error  estimates 
and  confidence  intervals. 


C.  ANALYSIS  OF  TIME  TO  DEATH  AND  ITS  RELATION  TO  CONCENTRATION 


Sections  IX  and  X  contain  analyses  of  the  mortality  responses.  The 
statistical  methods  applied  in  those  sections  focus  on  one  or  on  several 
study  days  and  compare  the  mortality  rates  observed  up  to  and  including 
those  days  across  groups.  The  methods  and  models  applied  in  those  sec¬ 
tions  do  not  utilize  the  specific  time  to  death;  just  whether  or  not  the 
death  was  prior  to  the  day  under  consideration.  These  methods  are  based 
on  the  binomial  or  on  the  multinomial  distribution. 

Many  investigators  compile  the  (approximate)  times  to  death. 
Analysis  procedures  exist  that  utilize  such  information.  Such  analysis 
procedures  utilize  more  information  than  the  binomial  based  procedures 
and  thus  might  increase  the  sensitivity  of  inferences.  We  illustrate 
several  such  procedures  in  this  subsection. 

Figure  XV. 7  displays  the  observed  times  to  death  in  Chapman's 
21  day  test  on  chromium.  The  columns  in  that  figure  contain  the  group 
number,  the  average  measured  concentration  in  each  group,  the  day  of 
death  or  of  survival,  the  censor  code,  and  the  number  of  daphnids 
associated  with  that  case  (frequency).  The  censor  code  indicates 
whether  the  day  indicated  is  associated  with  a  death  or  with  daphnids 
that  survived  to  the  end  of  the  test.  For  daphnids  that  survived  to 
the  end  of  the  test,  the  times  to  death  are  known  just  to  the  extent 


that  they  exceed  21  days.  Such  data  are  referred  to  as  right  censored. 
The  days  associated  with  cases  having  censor  code  1  represent  numbers  of 
days  to  death.  The  days  associated  with  cases  having  censor  code  2 
represent  survival  beyond  that  number  of  days.  Statistical  procedures 
to  analyze  time  to  death  data  must  account  tor  such  censoring. 

Two  aspects  of  the  Chapman  data  shown  in  Figure  XV. 7  should  be 
noted.  First,  the  water  control  and  the  carrier  control  groups  were 
combined  for  the  purposes  of  the  analyses  in  this  subsection  and  are 
denoted  as  Group  1.  There  are  20  daphnids  in  this  group;  one  died  on 
day  10,  one  died  on  day  21,  and  18  survived  to  the  end  of  the  test 
(i.e.,  beyond  day  21).  Secondly,  the  times  to  death  shown  are  approxi¬ 
mate.  The  beakers  were  examined  on  days  3,  5,  7,  10,  12,  14,  17,  19, 
and  21.  The  daphnids  reported  as  dead  at  each  time  point  actually  died 
some  time  between  the  successive  inspections.  Thus  a  death  indicated 
as  day  14  could  in  fact  have  occurred  on  day  13  or  14;  a  death  indicated 
as  day  10  could  in  fact  have  occurred  on  days  8,  9,  or  10.  Such  data 
are  called  interval  censored;  their  values  are  known  to  lie  in  an  inter¬ 
val.  Statistical  methods  exist  for  analyzing  interval  censored  data 
(Meeker  and  Duke.L^J),  However  for  purposes  of  illustration  we  treat 
the  times  to  death  as  if  they  occurred  on  the  days  indicated.  If  it  is 
desired  to  utilize  the  times  to  death  for  statistical  analysis  purposes 
then  the  test  beakers  need  to  be  inspected  more  than  three  times  per 
week;  they  should  probably  be  inspected  daily. 

Figure  XV. 8  contains  comparisons  of  the  21  day  mortality  rates 
across  treatment  groups.  Two  tests  are  shown,  the  chi  square  test  of 
homogeneity  and  the  Cochran-Armi tage  test  of  linear  trend.  Both  of 
these  tests  were  discussed  and  illustrated  in  Section  IX. 

Both  tests  indicate  strong  statistical  evidence  of  lack  of  homo¬ 
geneity  of  21  day  mortality  rates  across  treatment  groups.  The  differ¬ 
ence  of  the  test  statistics  is  18.095-15.344=2.75],  based  on  6-1=5  d.f. 
The  difference  provides  a  test  of  nonlinear  trends  in  mortality  rates. 
Thus  there  is  strong  statistical  evidence  of  a  linear  trend  in  mortal¬ 
ity  rates,  but  no  statistical  evidence  of  higher  order  trends. 

The  estimated  standard  errors  of  the  21  day  mortality  estimates  are 
approximately  as  follows: 


Group  1:  N  =  20,  p  =  0. 1 ,  s  =  0.067 
Groups  2,3:  N  =  10,  p  =  0.1,  s  =  0.095 
Group  4:  N  =  10,  p  =  0.3,  s  =  0.145 
Groups  5,6,7:  N  =  10,  p  =  0.5,  s  =  0.158 

We  now  consider  two  approaches  to  the  statistical  analysis  of  the 
times  to  death;  one  is  parametric  and  the  other  is  nonparametric.  We 
first  consider  the  parametric  analysis. 
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Preliminary  probability  plots  (not  shown)  of  the  times  to  death  (see 
Meeker  and  Duke, [33])  indicate  that  the  distributions  of  times  to  death 
in  each  group  can  be  approximated  by  the  normal  distribution.  We  fit  a 
succession  of  regression  models  to  describe  the  distributions  of  times 
to  death  and  to  compare  these  distributions  among  treatment  groups. 

The  regression  models  are  fitted  to  the  data  by  maximum  likelihood 
techniques  using  the  CENSOR  program  (Meeker  and  Duke, [33]).  CENSOR  has 
the  capability  to  accommodate  censored  values  among  the  times  to  death, 
corresponding  to  the  daphnids  that  survived  to  the  end  of  the  test.  The 
models  fitted  to  the  mortality  data  assume  that  time  to  death  is  normally 
distributed  with  constant  variance  across  groups.  The  mean  time  to 
failure  depends  on  the  treatment,  in  a  manner  specified  for  each  model. 

The  first  model  fitted  is  an  analyses  of  variance  type  model.  It 
assumes  different  mean  times  to  failure  in  each  concentration  group, 
with  no  assumptions  concerning  the  form  of  the  trend.  This  is  an  eight 
parameter  model— a  mean  time  to  failure  in  each  group  and  a  common  scale 
parameter.  The  results  of  fitting  this  model  to  the  data  are  shown  in 
Figure  XV. 9.  The  coefficient  Bq  represents  the  estimated  mean  time  to 
death  (i.e.,  the  50th  percentile)  in  Group  1,  the  control  group.  This 
is  35.59  days  and  is  well  beyond  the  end  of  the  test.  The  coefficients 
B]  to  Bg  represent  the  differences  between  the  estimated  mean  times  to 
death  in  Groups  2  to  7  and  that  in  Group  1.  The  estimated  mean  times 
to  death  in  these  groups  are  thus: 


Group 

Cone. 

Mean  Time 

to  Death  (Days) 

2 

13 

35.59  - 

3.16  = 

32.43 

3 

29 

35.59  - 

3.16  = 

32.43 

4 

66 

35.59  - 

10.30  = 

25.29 

5 

132 

35.59  - 

18.00  = 

17.59 

6 

294 

35.59  - 

14.16  = 

21.43 

7 

655 

35.59  - 

14.54  = 

21 .05 

The  95  percent 

confidence 

intervals  on 

B4.  B5, 

and  Bg  do  not  contain 

0.  Thus  the  mean  times  to  death  are  significantly  lower  in  Groups  5,  6, 
and  7  than  in  the  control  group  at  the  five  percent  level  of  significance 
if  adjustments  are  not  made  for  simultaneously.  After  adjustment  for 
simultaneous  inferences  by  Bonferroni's  techniques,  only  B4  is  signifi¬ 
cantly  different  from  0. 

The  second  model  fitted  is  a  linear  regression  model.  Let  C  denote 
the  concentration.  This  model  assumes  that 


Mean  Time  to  Death  =  BQ  +  B^ log-j q(1+C) 


Group  N 

P 

req 

Std  Err„._ 
reg 

^bin 

Std  Errb(n 

1 

20 

0.083 

0.050 

0.1 

0.067 

2 

10 

0.224 

0.056 

0.1 

0.095 

3 

10 

0.282 

0.055 

0.1 

0.095 

4 

10 

0.349 

0.058 

0.3 

0.145 

5 

10 

0.411 

0.065 

0.6 

0.155 

6 

10 

0.486 

0.078 

0.5 

0.158 

7 

10 

0.561 

0.092 

0.6 

0.155 

The  two  estimates  of 

21  day  mortality  are  reasonably  comparable  for 

groups  1 ,  4 

,  6,  and  7 

and  differ  somewhat. 

due  to  smoothing,  for  groups 

2,  3,  and  4 

The  standard  errors  of  the  regression 

estimates  are  much 

reduced  relative  to  those  for  the  binomial 

estimates 

.  For  example  for 

group  7  the 

estimated 

21  day  mortalities  are  similar  whether  the  regres 

sion  estimate  or  the  standard  binomial  estimate  is  used.  However  the 
ratio  of  the  standard  errors  of  estimates  is  (0.155/0.092)  =  1.685. 

This  implies  that  it  requires  (1.685)2  =  2.84  times  as  many  daphnids  to 
obtain  comparable  precision  with  this  binomial  based  estimate  as  would 
be  obtained  with  the  regression  estimate.  The  corresponding  squared 
ratios  for  all  but  group  1  are  close  to  or  in  excess  of  3. 

Thus  utilizing  the  actual  times  to  death  and  regression  models  to 
relate  these  times  to  death  across  concentrations,  can  result  in  substan¬ 
tially  improved  inference  sensitivity  as  compared  with  the  standard 
binomial  theory  estimates  based  on  21  day  mortalities.  Statistical  pro¬ 
cedures  based  on  the  times  to  death  also  provide  mortality  estimates  at 
time  points  other  than  21  days,  such  as  7  days  or  14  days. 

A  normal  probability  plot  of  the  residuals  from  this  fit  (not  shown) 
was  prepared  and  shows  no  evidence  of  departures  from  the  model 
assumptions. 

The  previous  techniques  were  based  on  a  parametric  regression  model 
to  describe  the  distribution  of  time  to  failure  and  to  relate  it  to  the 
level  of  test  concentration.  The  previous  models  were  based  on  the 
assumption  that  time  to  death  was  normally  distributed,  with  constant 
variance  across  groups. 

Coxt34]  proposed  a  regression  model  that  does  not  require  speci¬ 
fying  the  form  of  the  distribution  of  time  to  death.  Let  X  denote  a 
predictor  varaiable;  in  our  case  X  would  be  some  function  of  concentra¬ 
tion,  such  as  log(l+C).  Let  Fx(t)  denote  the  cumulative  distribution 
function  of  time  to  death  of  daphnids  associated  with  predictor  variable 
X  and  let  fx(t)  denote  the  corresponding  probability  density  function. 

The  hazard  function,  hx(t),  is  defined  as 


hx(t)  =  fx(t)/[l-Fx(t)] 


with  a  common  scale  parameter  in  each  group.  The  results  of  fitting 
this  model  to  the  data  are  shown  in  Figure  XV. 10.  The  coefficient  Bq 
represents  the  estimated  intercept,  and  in  this  case  coincides  with  the 
estimated  mean  time  to  death  in  the  control  group  (37.37  days).  The 
coefficient  B]  represents  the  regression  slope;  it  is  negative  and  sta¬ 
tistically  significant  at  the  five  percent  level. 

The  maximum  value  of  the  log  likelihood  under  the  eight  parameter 
model  fit  shown  in  Figure  XV. 9  is  -115.5285.  The  maximum  value  of  the 
log  likelihood  under  the  three  parameter  model  fit  shown  in  Figure  XV. 10 
is  -117.3918.  Under  the  hypothesis  that  the  linear  regression  is  adequate 
to  describe  the  trends  in  mean  time  to  death,  -2  times  the  difference  of 
the  log  likelihoods  is  asymptotically  distributed  as  chi  square  with 
8-3=5  d.f.  This  provides  an  asymptotic  test  of  the  adequacy  of  the 
linear  regression  model;  Namely 

-2[-115.5285-(-117.3918)]  =  3.7266 

Since  this  value  is  not  significant  according  to  chi  square  distribution 
with  five  degrees  of  freedom,  we  accept  the  hypothesis  of  the  adequacy  of 
a  linear  regression  model. 

An  additional  fit  was  carried  out  in  which  a  single  distribution  of 
time  to  death  was  fitted  to  all  seven  groups.  The  maximum  value  of  the 
log  likelihood  under  this  two  parameter  model  is  -123.6182.  Under  the 
hypothesis  that  B]  =  0  in  the  previous  linear  regression  model,  -2  times 
the  difference  of  the  log  likelihoods  is  asymptotically  distributed  as 
chi  square  with  3-2=1 d.f.  This  provides  an  asymptotic  test  that  Bi  =  0. 
Namely 

-2[-l  17. 391 8- (-123. 61 82)]  =  12.4528 

Since  this  value  is  highly  significant  according  to  the  chi  square  distri¬ 
bution  with  one  degree  of  freedom,  we  reject  the  hypothesis  that  B]  =  0. 

We  thus  base  estimates  of  mean  time  to  death  and  of  probability  of  death 
before  various  times,  on  the  model  fit  shown  in  Figure  XV. 10. 

Figure  XV. 11  contains  estimates  of  the  mean  time  to  death  for  each 
concentration  group  and  of  the  probabilities  of  dying  by  7,  14,  or  21 
days.  These  estimates  are  based  on  the  linear  regression  model  fit. 

Each  group  can  be  identified  by  the  transformed  value  of  its  concentration, 
logio(l+C),  shown  under  C] . 

The  estimated  probabilities  of  death  by  21  days  and  their  associated 
standard  errors  according  to  the  linear  regression  model  and  according 
to  the  standard  binomial  estimates  are: 


Cox[34]  proposed  the  proportional  hazards  regression  model 


hx(t)  =  e6xho(t) 

where  hQ(t)  is  the  hazard  function  associated  with  a  reference  distribu¬ 
tion.  The  form  of  the  h0(t)  or  its  associated  Fo(t)  do  not  need  to  be 
specified.  In  this  sense  the  model  is  nonparametric.  The  proportional 
hazards  model  implies  that  the  cumulative  distribution  functions  of  time 
to  death  are  related  by 

gx 

1-Fx(t)  =  [1-F0(t)]e 


This  model  can  be  fitted  to  the  data  using  a  conditional  maximum 
likelihood  analysis  suggested  by  Cox.  This  model  was  fitted  to  the 
times  to  death  in  Chapman's  chromium  data  using  the  P2L  program  in  the 
BMDP  statistical  computing  system.  The  results  of  this  fit  are  shown 
in  Figure  XV. 12.  The  estimated  value  of  g  is  g  =  0.8427  and  g  is 
statistically  significant. 

The  fitted  model  is  thus 

0.8427X 

1_Fx(t)  =  [1-F0(t)]e 


where  X  =  logig(l+£).  The  survival  probabilities  shown  in  Figure  XV. 12 
correspond  to  X  =  X  =  1.4825.  Thus  1-F^(21)  =  0.7654  and 


9  ) 

1-FX(21)  =  (0.7654; 

Substituting  the  values  of  X  corresponding  to  each  test  concentration 
yields 


Group 

Cone. 

X-X 

1-FX(21) 

Fx(21) 

1 

0 

-1.4825 

0.926 

0.074 

2 

13 

-0.3364 

0.818 

0.182 

3 

29 

-0.0054 

0.766 

0.234 

4 

66 

0.3436 

0.670 

0.330 

5 

132 

0.6414 

0.632 

0.368 

6 

294 

0.9873 

0.541 

0.459 

7 

655 

1 .3344 

0.439 

0.561 
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FIGURE  XV. 1  REPEATED  MEASURES  ANALYSIS  OF  VARIANCE  OF  NUMBERS  OF  OFFSPRING  ON 
DAYS  11,  13,  15,  18,  AND  21.  GOULDEN  IS0PH0R0NE-GR0UP  6  DELETED. 
SURVIVING  DAPHNIDS. 
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FIGURE  XV. 2  HISTOGRAM,  SUMMARY  STATISTICS,  AND  ANALYSIS  OF  VARIANCE  COMPARISONS  OF  AVG  COMPONENT  OF 
NUMBERS  OF  OFFSPRING  ON  DAYS  11  TO  21.  GOULDEN  IS0PH0R0NE  -  GROUP  6  DELETED.  SURVIVING 
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FIGURE  XV. 3  HISTOGRAM,  SUMMARY  STATISTICS,  AND  ANALYSIS  OF  VARIANCE  COMPARISONS  OF  LIN  COMPONENT  OF 
NUMBERS  OF  OFFSPRING  ON  DAYS  11  TO  21.  GOULDEN  ISOPHORONE  -  GROUP  6  DELETED.  SURVIVING 
DAPHNIDS. 


GOULDEN  PRODUCTIVITY  DATA.  DAYS  11  TO  21.  GROUPS  1  TO  5.  VARIATION  OF  POLYN  COMPS  AMONG  GROUPS.  PRDCTVTY 
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PAGE  9  GOULDEN  PRODUCTIVITY  DATA.  OAYS  11  TO  21.  GROUPS  1  TO  5.  VARIATION  OF  POLYN  COMPS  AMONG  GROUPS  PRDCTVTY . 
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APPENDIX  A. IV  NESTED  ANALYSIS  OF  VARIANCE  MODEL  TO  TEST 

FOR  BEAKER  TO  BEAKER  VARIATION  WITHIN  GROUPS 


In  this  appendix  we  specify  the  nested  analysis  of  variance  model  that 
was  used  to  test  beaker  to  beaker  variation  within  groups.  This  model  is 
a  nested  model  with  beakers  nested  within  groups  and  daphnids  nested  within 
beakers.  Groups  are  fixed  effects  while  beakers  and  daphnids  are  random 
effects. 

Let  i  +-*■  group  i  =  1,...,  I. 

j  *-*■  beaker  j  =  1,..., 

k  ■*-*■  daphnids  k  =  1,...,  N^j 

Let  Y-jjk  denote  the  length  of  the  k-th  daphnid  within  the  j-th  beaker 
of  the  i-th  group.  Then 

ijk  H  x  j(i)  k(ji)  ljk 


Ea.  =  0,  b 


j  (i) 


ind  ..  2, 

N(0,  ab),  Ck(i) 


ind  ■xt / r\  2v 

^  N(0,  a  ), 


c...  X"d  N(0,  a2) 

ijk  E 


The  a^'s  are  the  fixed  group  effects.  The  bw^j’s  are  the  random  beaker 
effects.  The  Ck(ji)'s  are  the  random  daphnids  effects  due  to  biological 
variation  and  to  experimental  variation.  The  cijk*s  are  random  measurement 
errors.  Since  we  measure  each  daphnid  just  once,  this  source  of  variation 
cannot  be  separated  from  the  biological  and  experimental  variation  and  so 
we  incorporate  it  into  the  daphnid  to  daphnid  variation  in  subsequent 
discussion. 


Let  Y.  .  s  S,  Y.  . 


IT 


.  /N,  . ,  N  =  E .  •  N .  . ,  N  =  E  .  N  , 

l.lk/  ij’  i+  j  ij  ++  l  i+’ 


Y.  =  E  N  Y..  /N  ,  Y  =  E .  N.,  Y  /N 
i..  j  ij  ij*  i+  i  i+  i..  ++ 


Single,  double,  and  triple  bars  over  the  b's  and  c's  have  analagous  inter¬ 
pretations. 


The  analysis  of  variance  tabic  is  given  on  the  next  page.  In  the 
special  case  of  a  balanced  design  (i.e.  Ji  =  J,  N^j  r  N  for  all  i,j)  the 
expected  mean  squares  are 


2  JN  _  22  2 

a  +  - — r  I.  a.  +  No  ,  o  + 
c  1-1  i  i  be 


2  2 

No,  ,  and  a 
b  c 


Under  the  null  hypothesis  of  no  beaker  to  beaker  variation  within  groups 
(i.e.  =  0)  the  F-ratio  has  a  central  F  distribution  with  degrees  of 


freedom  Ei(Ji-l)  and  Ei  Ej (N^j-l) .  Under  the  alternative  hypothesis  it 
has  a  complicated,  nonstandard  distribution.  However,  in  the  balanced 
special  case,  the  alternative  distribution  is  (1  +  Nc^/o^)  times  a  central 
F  distribution  with  degrees  of  freedom  l(J-l)  and  IJ(N-l). 
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of  Squares  Expected  Mean  Square 


APPENDIX  AV.  ALTERNATIVE  ADJUSTMENT  PROCEDURE  TO  ACCOUNT  FOR 
HETEROGENEITY  OF  MORTALITY  RESPONSES  IN  THE 
PRESENCE  OF  UNEQUAL  SAMPLE  SIZES  ACROSS  BEAKERS 
WITHIN  GROUPS  OR  ACROSS  GROUPS 


In  subsection  A  we  discussed  a  relatively  simple  procedure  for  adjusting 
sample  sizes  to  reflect  heterogeneity  of  mortality  rates  among  beakers  within 
groups.  That  procedure  was  based  on  an  assumption  of  equal  numbers  of 
daphnids  per  beaker  within  each  group.  Such  an  equal  sample  size  assumption 
is  often  reasonable  in  toxicity  tests  with  daphnids  because  beakers  are 
usually  started  with  equal  numbers  of  daphnids  at  the  outset  of  the  test. 

In  particular  this  was  the  case  in  LeBlanc's  Tests  A  and  B  and  in  Adams 
test  with  selenium.  There  are  however  some  situations  when  the  assumption 
of  equal  sample  sizes  might  not  be  valid.  For  example  in  Goulden's  test 
with  isophorone  there  are  ten  beakers  per  group.  Three  of  those  beakers 
start  the  test  with  five  daphnids  each  while  the  other  seven  start  the 
test  with  just  on  daphnid  each.  If  we  wish  to  combine  mortality  results 
from  the  individually  housed  daphnids  with  those  from  the  multiply  housed 
daphnids  then  we  may  need  to  adjust  for  heterogeneity  across  beakers  having 
unequal  sample  sizes.  Similar  situations  arise  when  inferences  for  a  given 
response  are  to  be  based  on  survivors  up  to  a  certain  stage.  For  example 
in  the  fathead  minnow  early  life  stage  tests  the  fry  surviving  for  30  days 
were  examined  for  abnormality.  Tank  to  tank  variation  in  30  day  mortality 
within  groups  casues  unequal  sample  sizes  with  respect  to  the  fry  abnor¬ 
mality  response.  A  similar  situation  would  arise  in  the  analysis  of  data 
from  toxicity  tests  with  Daphnid  if  it  is  of  interest  to  compare  conditional 
mortality  across  groups.  For  example  it  might  be  of  interest  to  compare 
mortality  rates  across  groups  based  only  the  latter  part  of  the  life  stage. 
Thus  21  day  mortality  might  be  compared  across  groups  using  responses  only 
from  survivors  after  14  days.  This  would  estimate  the  probability  that  a 
daphnid  survives  for  21  days  given  that  it  has  survived  for  14  days. 

Variation  in  14  day  survival  among  beakers  within  groups  causes  unequal 
sample  sizes  with  respect  to  conditional  21  day  survival. 

Adjustments  in  the  presence  of  such  unequal  sample  sizes  can  be  carried 
out  with  a  full  maximum  likelihood  fit  based  on  the  beta  binomial  model 
(Williams  [3]).  Namely  within  the  i-th  group,  it  is  assumed  that  Xij 
has  a  beta  binomial  distribution  with  parameters  (N^ j ,  p-^,  9^).  The 
probability  function  of  X^j  is  then 


The  parameter  (p^,  9^)  can  be  estimated  by  maximum  likelihood  analysis. 
This  would  require  specialized  computer  programs.  The  hypothesis 
Hq:  9^  =  02=  ...  =  9^  can  be  tested  based  on  asymptotic  maximum 
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likelihood  theory.  Adjustments  can  then  be  carried  out  using  separate 
estimates  9j_,  02,...,  8j  or  a  common  estimate  9.  The  adjustment  factor 
within  the  j-th  beaker  of  the  i-th  group  is  then 

1  +  N. .  0.  1  +  N  .  0 

- or  K__ - il— 

l  +  e1  1J  l  +  e 

The  responses  within  each  beaker  are  then  adjusted  based  on  these  factors. 

If  the  N^j  =  are  approximately  constant  within  groups  but  vary  across 

groups  then  K^j  =  =  (l+N-^0) / (1+9) .  Equal  degrees  of  extrabinomial  varia¬ 

tions  across  groups  would  imply  that  =„(1  +  %  0)/(l  +  6).  We  obtain  a 
pooled  estimate  of  0  by  calculating  =  Var(P^j) / [u^  (l-ui)/N^l  as  before 
(bounding  it  between  1  and  N^)  and  calculating  £  (Ki~l) / (N^-K^) .  We  pool 
the  0js  across  groups  by  calculating  the  weighted  average. 


The  adjustment  factors 


0 


zi  (V1)  °i 


1  +  N.  0 
_ x 

1  +  0 


are  then  used  within  each  group. 
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APPENDIX  AX 


CONFIDENCE  INTERVAL  ON  CONCENTRATION  THAT 
CORRESPONDS  TO  A  GIVEN  LEVEL  OF  INCREASE 
IN  MORTALITY  RATE  OVER  CONTROL  GROUP  RATE 


APPENDIX  AX.  CONFIDENCE  INTERVAL  ON  CONCENTRATION  THAT  CORRESPONDS 
TO  A  GIVEN  LEVEL  OF  INCREASE  IN  MORTALITY  RATE  OVER 
CONTROL  GROUP  RATE 


This  appendix  is  patterned  after  Appendix  AXV  in  Feder  and  Collins  [11. 

After  we  have  fitted  the  probit  model  to  the  mortality  data  by  non¬ 
linear  regression  we  wish  to  calculate  confidence  bounds  on  the  concentra¬ 
tion,  CL,  that  results  in  an  increase  in  mortality  of  L  above  the  control 
group  rate,  p0.  Schematically, 


Let  z  ;  logio  (cone)  -  m,ZL  £  log-|g  Cl-tti.  We  fit  a  probit  model  in 
terms  of  z.  We  want  a  point  estimate  and  confidence  interval  on  C^,  such 
that  $(?0  +  81  zO  -  L.  Let  9  £  (pg,  8g.  8 -| ) .  We  estimate  9  by  0 ,  the 
maximum  likelihood  estimate  based  on  the  probit  model  p  (9;  cone)  =  pg  + 
(1-pg)$(8g  +  6i  z)  using  the  program  BMDPAR.  Thus  Zl  satisfies  the 
equation 

Sg  +  6 1  zl  =  (L)  =  fi_, 


zl  =  (fL"8o)/61  =  g(6o»  8-|;  L) 


We  construct  a  confidence  interval  on  zl  based  on  the  estimate  zl  and  the 
delta  method  (Cramer  [271 , „PP- „ 366-367) .  Asymptotically,  as  sample  size 
increases,  the  function  g(8g,  $i;  L)  can  be  approximated  by  a  first  order 
Taylor  expansion  about  So  and 

ZL  =  ZL  +  (So-Bo)  Sg/3B0  +  ( 3 -j  —  6 1 )  3g/38i 

Under  certain  standard  regularity  assumptions,  the  asymptotic  distribution 
of  (appropriate  standardizations  of)  Zl  can  be  shown  to  be  the  same  as  that 
of  its  Taylor  approximation,  namely  normal  with  mean  zl  and  variance 


M  >  QiJ> 


Var(zL)  s  (3g/3BQ,  Sg/SR^  l 


'3g/3Rr 


va  g  /  a  c 


where  Z  is  the  estimated  variance-covariance  matrix  of  ( Bo »  81)  and  3g/3£o> 
g/3 B i ,  are  the  first  derivatives  of  g  evaluated  at  6q  and  B-|.  The  matrix 
is  determined  from  the  BMDPAR  output.  The  derivatives  of  g  are 


3 g/3 Bo  =  - 1/6 i 

2 

ag/36!  = 

The  asymptotic  1-a  confidence  interval  on  z^,  is  thus 
ZL  e  ZL  ±  ia/2  fVar  <zl)1 1/2 

where  Ca/2  is  the  upper  a/2  point  of  the  standard  normal  distribution. 

These  calculations  can  easily  be  programmed  on  a  computer  or 
calculator. 


417 


APPENDIX  AXII.1  THEORY  UNDERLYING  POINT  AND  CONFIDENCE  INTERVAL 
ESTIMATION  OF  CONCENTRATIONS  ASSOCIATED  WITH 
SPECIFIED  REDUCTIONS  IN  AVERAGE  REPRODUCTION  OR 
LENGTH  RELATIVE  TO  THE  CONTROL  GROUP 


Let  xHlogig  (concentration),  v  =  (x-m)/s,  I=indicator  of  treatment 
groups.  We  fit  the  regression  model 

YzU+BqI+BtIx+e 

or 

Y=P+60I+61Iv+B2Iv2+e 


where  e  represents  the  random  variation,  assumed  to  be  independently 
distributed  with  mean  0  and  variance  Let  c  denote  a  specified 

incremental  response  from  the  control  group  average.  We  wish  to  estimate 
the  value  of  v  such  that 


8  0+8 1 v=c 


or 


80+8  -|V+f$2v^=c 


and  place  confidence  intervals  on  this  value.  The  starting  point  for  these 
inferences  is  the  output  from  the  regression  analysis  program  which 
provides  estimates  of  model  parameters  ( § 0 , 6  -j )  or  (§0,6  t  ,63)  and  the 
estimated  variance-covariance  matrix  of  these  parameters,  which  we  denote 
as  f.  Let  v  denote  the  residual  degrees  of  freedom.  We  consider,  in  turn, 
the  straight  line  and  the  quadratic  cases. 


Straight  Line  Case 

We  solve  the  equation 
(8o-c)+3iv=0 

thus 


vc  =  -(6  0-c)/8i 


The  point  estimate,  vc,  of  vc  is  thus 
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We  construct  an  approximate  confidence  interval  onvc  by  use  of  the 
delta  method  (sometimes  referred  to  as  propagation  of  errors).  Let  vch 
g ($o> &l) •  Then  vc=g(60»^l)-  Expand  g(Bo.Bi)  in  a  first  order  Taylor 
expansion  about  (30,3-| ) .  Thus 


vc=g(B0,S-| )  =  g(80,6 1  )+(60-B0)3  g/3B0+($  i-  Bi  )3  g/3B  i+remainder 


The  remainder  becomes  small  asymptotically  as  the  sample  size  gets  large. 
Approximating  vc  by  the  first  order  terms  in  the  expansion,  we  obtain  the 
result  (loosely  stated)  that  vc  is  asymptotically  normally  distributed  with 
mean  vc  and  variance 


Var(vc)  =  (3g/360,3g/3Bi)t  0g/36o»3g/33i)' 


We  approximateAthis  asymptotic  variance  by  substituting  $  for  $  and  by 
substituting  (^0»^1^  for  in  tlle  expressions  for  8  g/3B  0,3  g/3B  i .  Let 

Var  (vc)  denote  this  estimated  variance. 

To  complete  the  characterization  of  the  asymptotic  distribution  of  vc, 
we  need  to  specify  the  functional  forms  of  the  derivatives.  These  are 


3g/3B0  =  -1/8! 


3  g/3  B  1  =  (Bq-cVBi2 


An  approximate  1-a  confidence  interval  on  v3  is 


vcevc±t(1-a/25vHVar(vc)]  1/2e(v£,vu) 


where  t(1-a/2;v)  is  the  upper  a/2  point  of  the  t  distribution  withy 
degrees  of  freedom. 

To  translate  this  point  estimate  and  confidence  interval  to  an  estimate 
and  confidence  interval  directly  on  concentration,  we  use  the  relation 


cone  =  I0m+sv 


thus 


concc  =  10m+svc,  conc£  =  10m+Sv£,  concu  =  10m+svu 


These  are  the  inferential  values  of  interest 


Quadratic  Case 


This  is  conceptually  very  similar  to  the  straight  line  case  however 
there  are  several  technical  complications  due  to  the  greater  complexity  of 
the  model. 

We  solve  the  equation 

(0o-c)+61v+62v2=O 


There  are  two  roots  to  the  equation: 


vc  = 


-61±[012-402(0o-c)]1/2 
2^  ' 


V 

c 


(0Q  C) 


if 

if  B2=0 


We  considered  the  f>2=0  case  in  the  discussion  of  the  straight  line  case. 

We  therefore  assume  below  that  0  2^0. 

The  point  estimate,  vQ,  of  vQ  is 

-01±[012-402(0o-c)]1/2  a  „  2  _ 

Vc  =  - — -  if  02*>,  0X  -4B2(0o-c)>O 

We  will  generally  be  interested  in  the  larger  root.  Which  root  is  largest 
depends  on  the  values  of  the  coefficients.  Rather  than  attempt  to  choose 
the  appropriate  root  a  priori,  we  will  calculate  point  and  confidence 
interval  estimates  of  both  roots  and  then  will  choose  which  is  most 
appropriate  from  the  context  of  the  problem. 


We  construct  an  approximate  confidence  interval  on  vc  by  use  of  the 
delta  method,  much  as  we  did  in  the  straight  line  case.  Let 


v_  =  f_  (0O,01,02)  = 


-B1-[012-402(0o-c)]1/2 

2E 


V+  =  f+  (eo,3r02)  s 


-01+[012-402(0o-c)]1/2 
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V+  =  f+  (30,ei’62)  =  f+  (6O,0l,62)  +  ^0-60)3V360  +  (01'-B1)3f+/as1 

A 

+  (02~^2^ 3^+^3®2  rema:*-nder 

V-  =  f-  (3O,0l’62)  =  f-  (6o,61,82)  +  (B0~  V3f-/3B0  +  (21'B1)3f_/3P1 

+  (82_ ^2)  3^2  +  remainder 

A  A 

Approximating  v+,  v_  by  the  first  order  terms  in  the  expansions,  we  obtain 
the  result  that  v+,  v_  are  asymptotically  normally  distributed  with  means 
v+,  v_  and  variances 


Var(v_)  =  Of_/ae0,  3f_/901,  3f_/3g2)  H  3f_/3Bx 


f  /3B, 


3f  /5 B, 


Var(v+)  =  (3f+/3B0,  3f+/3Bl,  3f+/302)  $1  3f+/3Bl 


3V3Bo 


,3f+/3g 


We  approximate  these  asymptotic  variances  by  substituting  |  for  $  and 
'A°,BJ ’&?>  £$r  (eo.01»^2^  in  the  expressions  for  the  derivatives.  Let 
Var  (S>_),Var  (v+)  denote  these  estimated  variances.  Approximate  1-a 
confidence  intervals  on  v_ ,v  are 


v_ev__  ±  t(l-a/2;v)  [Var(v_)  =  (v£_,vu_) 


v+ev+  ±  t(l-a/2;v) [Var(v+) ]■ 


<v£+’^u+) 
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where  t(1-a/2;v)  is  the  upper  a/2  point  of  the  t  distribution  with  v 
degrees  of  freedom.  We  then  transform  these  estimates  and  confidence 
intervals  in  the  usual  manner  to  obtain  estimates  and  confidence  intervals 
directly  on  concentration.  Namely  we  use  the  relation 


conc=10m+3V 


To  complete  the  characterization  of  the  asymptotic  distributions  of  v 
v+,  we  need  to  specify  the  functional  forms  of  the  derivatives. 

These  are 


,  -i  -1-V"161  V_1(6r.-c)  B.+V 

8f_/8B0  =  V  3f_/36n  =  — ^ - -  9f_/330  =  - ^ -  +  1 


1  23. 


2  B  '2 

2  2B2Z 


..-1 


-1+v‘V 


-1, 


.  -x  ,  "v  (Bn-C)  B.-V 

3f+/9B0  =  -V  3f+/3B1  =  — - -  3^/36.  =  - ^ -  +  1 


26. 


82  '  2622 
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APPENDIX  AXII.2  CONFINT— A  COMPUTER  PROGRAM  TO  CALCULATE  POINT  AND 
|  CONFIDENCE  INTERVAL  ESTIMATES  OF  CONCENTRATIONS 

ASSOCIATED  WITH  SPECIFIED  REDUCTIONS  IN  AVERAGE 
REPRODUCTION  OR  LENGTH  RELATIVE  TO  THE  CONTROL  GROUP 


In  Appendix  AXII.1  we  discussed  the  theory  underlying  the  construction 
of  point  and  confidence  interval  estimates  of  concentrations  associated 
with  specified  reductions  in  average  length  or  reproduction  by  means  of  the 
delta  method.  In  this  appendix  we  describe  the  features  and  use  of  a 
computer  program  to  implement  that  theory.  The  program  was  written  by 
Claire  Matthews.  Applications  of  the  program  are  illustrated  in  the  body 
of  the  section. 


General  Description 

CONFINT  is  a  FORTRAN  computer  program  which  calculates  a  point  estimate 
and  95  percent  confidence  interval  for  the  concentration  producing  a 
specified  change,  c,  in  the  average  response  relative  to  the  control  group 
average.  The  program  assumes  that  the  response  parameter  of  interest  (y) 
has  a  linear  or  quadratic  dose-response  relationship  with  logarithmic 
concentration  (x),  among  the  treatment  groups.  Therefore  the  underlying 
model  is  y=y  for  the  control  group  data,  and 


y=f (x) =y+60+6] x  0  (1) 

or  y=f(x)=u+B0+Bix+B2x'‘  (2) 

for  treatment  group  data,  depending  on  whether  a  linear  or  quadratic  model 
is  appropriate.  The  program  calculates  the  root(s)  of  the  equation 
f(x)=y+c,  where  c  represents  the  extent  of  change  relative  to  the  control 
group  response,  as  specified  by  the  user. 

Estimates  of  the  coefficients  of  the  above  model  are  input  requirements 
to  CONFINT.  The  most  straightforward  way  to  obtain  these  estimates  is  to 
first  run  a  standard  multiple  linear  regression  analysis,  using  any 
standard  statistical  package.  To  incorporate  data  from  both  the  control 
group  and  the  treatment  groups  in  the  same  regression  run,  define  an 
indicator  variable  I  where  1=1  for  treatment  groups  and  0  for  control 
groups.  Then  create  independent  variables  I,  lx,  and  lx2,  and  use  the 
regression  program  to  fit  the  model 

y=y+B0I+BiIx+e  (1A) 

or  y=u+B0I+BiIx+B2lx2+e  (2A) 

This  yield  values  of  the  estimates. 

In  addition  to  the  regression  coefficient  estimates,  CONFINT  requires 
the  estimated  variance  covariance  matrix  of  the  {Bi)  coefficients.  The 


SPSS  regression  program  e.g.,  can  print  out  this  matrix  through  one  of  its 
regression  output  options. 


It  is  often  desirable,  from  a  numerical  analysis  standpoint,  to  rescale 
the  x  values  to  v=(x-m)/s  in  Equations  (1A)  and  (2A)  in  order  to  reduce 
correlations  among  the  regression  coefficients  and  to  scale  the  regression 
coefficients  and  variance  covariance  matrix  elements  so  that  they  are  all 
about  the  same  order  of  magnitude  and  that  none  are  so  large  or  so  small 
that  significant  digits  are  lost  in  the  regression  output.  Any  values  can 
be  chosen  for  m  and  s.  However  they  must  be  recorded,  since  they  must  also 
be  specified  as  input  parameters  to  CONFINT.  A  convenient  choice  of  m  and 
s,  although  not  the  only  one,  is  the  mean  and  standard  deviation, 
respectively,  of  the  x-values  of  the  treatment  group  observations. 


Program  Input  Specifications 


The  input  specifications  for  each  problem,  read  into  CONFINT  on  (input 
unit)  TAPE5,  consist  of  a  set  of  six  or  seven  cards  depending  on  whether  a 
linear  or  quadratic  model  is  used.  For  each  set  of  input  parameters  the 
program  will  calculate  one  set  of  point  estimates  and  95  percent  confidence 
intervals  corresponding  to  the  user-specified  change,  c,  from  the  mean 
control  response. 

The  input  cards  are  to  be  arranged  as  follows: 


Card 

Columns 

Format* 

Variable 

1 

1-80 

10A8 

any  title  occupying  any  of  the  80  columns 

2 

1-  5 

15 

no.  of  degrees  of  freedom  for  error  in  the 

regression  fit 

6-10 

(blank) 

11-20 

F10.0 

regression  estimate  for  u 

21-30 

F10.0 

regression  estimate  for  B0 

31-40 

F10.0 

regression  estimate  for  B) 

41-50 

F10.0 

regression  estimate  for  62  — leave  blank 

if  a  straight  line  model  is  used 

3 

1-  5 

15 

ISCALE=0  if  no  scale  charge  was  used  for  x 

=  1  if  a  scale  change  of  the  form 
v=(x-m)/s  was  used 

6-10 

(blank) 

11-20 

F10.0 

value  used  for  m  *  leave  blank  if 

21-30 

F10.0 

value  used  for  s  1  ISCALE  =  0. 

4 

1-  5 

15 

IUNIT  =1  if  c  is  given  in  absolute  units; 

=  0  if  c  is  given  as  a  decimal 

(proportion  relative  to  the 
control  mean) . 

6-10 

(blank) 

11-20 

F10.0 

value  for  c,  which  can  be  either  positive 

or  negative.  For  example,  if  one  is 


interested  in  the  concentration 
associated  with  a  50  percent  reduction 
from  control  group  average  (i.e.,  an 
EC50),  use  IUNIT=0  and  C=-0.50 


5 

1-10 

F10.0 

variance  (B0) 

6 

1-10 

F10.0 

covariance^ (6  0 ,8 1 ) 

11-20 

F10.0 

variance  (B i ) 

7*» 

1-10 

F10.0 

covariance  (§0,82) 

11-20 

F10.0 

covariance^ (B i ,B  2  ^ 

21-30 

F10.0 

variance  (B 2 ) 

1 

1-80 

10A8 

Title  card  for  a  new  problem  if  necessary. 

etc. 

etc. 

etc. 

Cards  1-7  can  be  repeated  as  many  times  as 

desired  for  any  number  of  problems.  An 
7  end-of-file  card  should  follow  the  last 


(or  end-of-file  card) 


problem. 


•Note:  An  "FI 0.0"  format  indicates  that  the  user  should  enter  a  floating  point 
number  in  the  field  of  10  columns  (not  necessarily  right- justified) ;  a  decimal 
point  must  be  punched,  although  it  can  appear  anywhere  in  the  number.  Variables 
having  the  "15"  format,  however  must  be  entered  as  right- justified  integers 
ending  in  column  5. 

••Card  7  should  be  omitted  if  a  linear  model  is  used.  Note  that  cards  5-7  are 
set  up  to^  contain  the  lower  diagonal  portion  of  the  variance-covariance  matrix 
for  the  {8^}  regression  coefficients. 
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A  separate  output  is  produced  for  each  problem  inputted  to  CONFINT. 

Each  output  first  includes  a  printout  of  the  input  specifications, 
including  the  model  chosen  by  the  user,  regression  coefficients,  rescaling 
transformation  (if  any),  error  degrees  of  freedom,  and  the  estimated 
covariance  matrix  of  the  {0^}  regression  coefficients.  The  value  of  c  is 
printed  in  absolute  y  units;  if  c  was  read  in  as  a  decimal  using  relative 
units,  then  the  program  recalculates  c  as  c*p  before  printing  it  back. 

The  program  reports  calculations  for  both  roots  of  the  equation 
f(x)=y+c  if  a  quadratic  model  is  used  and  if  two  roots  exist.  (If  no  roots 
exist,  the  program  prints  out  a  message  to  this  effect  before  going  on  to 
the  next  problem. )  The  partial  derivatives  of  the  polynomial  roots  with 
respect  to  the  6 's  are  printed  out.  From  these  derivatives  and  the 
estimated  variance  covariance  matrix,  the  estimated  variances  of  the  roots 
are  calculated  and  printed  out. 

Finally,  the  program  prints  out  the  point  estimates  of  the  roots  and 
their  associated  95  percent  confidence  intervals,  in  three  different 
scaling  systems: 

(1)  in  rescaled  units,  if  x  was  rescaled  to  v  e  (x-m)/s. 

(2)  in  log-concentration  units  (regular  x  units),  and 

(3)  in  raw  concentrations,  where  antilogarithms  are  taken  of  all  the 
values  reported  in  (2). 

In  the  event  that  x  already  represented  raw  concentrations,  the  values 
reported  for  (3)  should  be  ignored. 

It  should  be  noted  that  whenever  the  quadratic  model  yields  two  roots, 
the  smaller  of  the  two  roots  may  be  much  lower  than  the  lowest  treatment 
concentration.  It  is  calculated  and  printed  out  for  mathematical 
completeness,  but  it  should  probably  be  disregarded  ecause  it  may  be 
totally  unrealistic  in  the  physical  context  of  the  experiment.  The 
specific  root  to  be  used  depends  on  the  context  of  the  problem,  but  it  will 
usually  be  the  larger  root. 


APPENDIX  AXIII.  THEORY  UNDERLYING  THE  CONSTRUCTION  OF  CONFIDENCE  INTERVALS 
ON  TREATMENT  GROUP-CONTROL  GROUP  MORTALITY  RATE 
DIFFERENCES  BASED  ON  THE  RESULTS  OF  THREE  PARAMETER  PROBIT 
MODEL  REGRESSION  FITS 


Let 

p ( cone) =p0+(1-p0)$(6 0+6 i (z-m) ) 

denote  the  three  parameter  probit  model.  The  notations  are  the  same  as 
those  discussed  in  the  body  of  Subsection  XIII. C.  The  model  is  fitted  to 
the  data  by  maximum  likelihood  estimation,  as  discussed  in  detail  in 
Section  X.  Let  P0)§oi^1  denote  the  parameter  estimates  and  let 
a(pQ) ,a(B0) ,a(6i ) ,R  denote  their  estimated  asymptotic  standard  errors  and 
the  asymptotic  correlation  matrix.  These  estimates  are  obtained  directly 
from  the  outputs  describing  the  probit  model  fits.  Several  such  outputs 
are  illustrated  in  Section  X.  The  estimated  asymptotic  variance-covariance 
matrix  of  P0,80,Si  is  thus 

„  fo ( p0)  o^  o  \  /o(p0)  o^  o  \ 

1=1  o  ct(60)  ao^  Jr  I  o  o(B0)  o^  J 

\  o  o  a(§i)/\  o  o  a  (B  \)  / 

Let  Z0,Z^  denote  the  Z-values  corresponding  to  the  concentrations  at 
the  control  group  and  at  treatment  group  i  respectively.  The  difference 
between  the  mortality  rates  at  these  two  groups  is 

f(p0,B0,61;Z0,Zi)S(1-p0)[<!>(Bo+ei(Zi-m))-$(Bo+Bl(Z0-ni))] 

This  difference  is  estimated  by  substituting  P0,B0,Bi  in  the  above 
expression. 

To  construct  a  confidence  interval  on  the  difference  we  must  calculate 
the  asymptotic  standard  error  of  the  estimated  difference.  This  is  done  by 
the  delta  method.  Namely 

3f/3p0  =  -[<J>(B0+6i  (Zi-m)  )-$(B0+Bi  (Z0-m)] 

3f/380  =  n-p0)[<{,(Bo+8l(Zi-m))-(f,(g0+Bi(Z0-m))] 

3f/3Bi  =  0-p0)[  (Zi-m)4>(B0+Bi(Zi-m))-(Z0-m)(|)(B0+Bi(Z0-m))J 

Var(f)=( 3f/3p0,3f/3B0,3f/3Bi)Z(3f/3p0,3f/8B0,3f/3B1 ) ' 

Std  err(f)=[Var(f) ] 1 ^ 

In  the  above  expressions,  f ,3f/3p0,3f/3Bo>3f^3Bl  are  obtained  by  sub¬ 
stituting  p0,B0iBi  for  P0,Bo»Bl  •  The  expression  <)>(•)  denotes  the  standard 
normal  probability  density  function.  If  logarithmic  concentration  is  used 
and  if  concentration  is  0  at  the  control  group  then  <5>  (S0+B  i  (Z0-m) ) , 

<*>(eo+e1  (Z0-m)) ,  and  (Zq-bO^Bq+Bi  (Z0-m))  are  set  identically  to  0. 


A  1- a  level  two  sided  confidence  interval  on  the  difference  is 
constructed  as 

(f-Ci_a/2std  err  (f )  .f+^i^/oStd  err(f)) 

where  ^i_a/2  is  the  upper  a/2  percentile  of  the  standard  normal 
distribution. 

Simultaneity  can  be  adjusted  for  by  Bonferroni's  method,  but  has  not 
been  done  so  for  these  intervals. 
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